2. Introduction to the TICK Stack and InfluxEnterprise

In this webinar, Michael DeSa will provide an introduction to the components of the TICK Stack and a review of the features of InfluxEnterprise and InfluxCloud. He will also demo how to install the TICK stack.

Watch the Webinar

Watch the webinar “Introduction to the TICK Stack and InfluxEnterprise” by clicking on the download button on the right. This will open the recording.


Here is an unedited transcript of the webinar “Introduction to the TICK Stack and InfluxEnterprise.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.

• Chris Churilo: Director Product Marketing, InfluxData
• Michael DeSa: Software Engineer, InfluxData

Michael DeSa 00:06.197 All right. Thank you, Chris. So, as Chris mentioned, today we’re going to be talking about just an introduction to the TICK stack. Really, just talking about what the TICK stack is, what the components are, what are the roles and the responsibilities of each of them. And then, getting into a little bit of when would I need, say, something like InfluxEnterprise or InfluxCloud, just to kind of bridge the gap of, what do I need out of all this? So, with that, the agenda today is to go over what the TICK stack is, so I’ll just cover each of the components, talk about maybe some performance aspects of each of the components, and how they all operate together. We’re going to go over what InfluxEnterprise is and what InfluxCloud is. So, kind of loose terms that get used, maybe conflated with some other things. We’ll talk about them in more detail. And then, finally, at the end of all this, we’re going to go through a demo where we just kind of show each of the components of the TICK stack. So, just everything kind of used together. Kapacitor, InfluxDB, Chronograf. And everything kind of—and Telegraf—all in kind of one little package. We have a repository that’ll be available for you to spin this up on your own, and just give it a quick demo before maybe trying it out completely in your own system.

Michael DeSa 01:36.876 So, InfluxData and sort of the TICK stack is a modern Open Source Platform, built from the ground up. All of the components were built from scratch, from us. And the idea is to be a modern engine for both metrics and events. So, as things have evolved, we’ve hit these kind of new workload requirements. The amount of data that we’re generating is really not sufficient for older systems. So, we have very, very high volume of real-time writes. It’s maybe okay if you lose a point here or there. Consistency is maybe not the most important thing. You just want to be available to take in an influx of data—and that was pun intended there. And so, we also saw that there’s a lot of other metrics platforms out there that either do just events, which we call sort of irregular time series, or metrics, which are just kind of regular time series. And we wanted to kind of bridge the gap between the two. And we wanted a system that allowed for both regular and irregular, depending on what you wanted to do, without kind of sacrificing large amounts of disk space or performance just to get both on the same system. And then the final part is, with this kind of time series or time-based workflows, you usually don’t want to keep around all of the data that you’ve ingested for an indefinite period of time. So, after a month, I want to expire my data. Or after a year, I want to expire my data. And so, we thought about how that comes into play and how you should be able to do that efficiently in our platform.

Michael DeSa 03:28.005 We also noticed that in just a regular database, something that’s not particularly a TSDB, you often have time-based queries where you’re doing queries based over some time period. And you can do these types of things in SQL or MongoDB, or any of this. But it’s a little bit more difficult to do. And we wanted that to be particularly easy to do in our platform. So, being able to do scans of data over some time period and sharding data based off of time. We wanted that to all be easy. And then, the final point here is, we wanted something that was scalable and highly available. So, if you needed to scale out to a massive cluster to take in your load, you could do so. Or if you need a highly available situation where you have multiple nodes that you want to be resilient to a failure, we wanted this to be an option that our users have. And so, for that reason, we built sort of our clustering into that. And so, we’ll kind of go through each of the components and how they all come into play in more detail, but this is just kind of a high-level overview of what InfluxData is and the product we built.

Michael DeSa 04:46.932 So, we’re going to go over kind of an architectural approach, just talking about each of the various components. Here we have kind of a diagram of the way you can really think about the components of the TICK stack. So, to the left here, we have Telegraf. Telegraf is responsible for collecting or generating the data that we feed onto the rest of the system. We have many users that don’t necessarily use Telegraf, but I’d say a large majority of our user base does use Telegraf with the rest of the TICK stack. So, Telegraf is for generating metrics, collecting metrics. So, if you’re familiar with other collectors out there, you can think of it kind of like collectd, or Diamond, or things like this. So, once that’s happened, we feed data into InfluxDB and in Chronograf. Namely, you’re using InfluxDB and Chronograf to kind of analyze the data, whether that be queries that you’re writing yourself or dashboards that you’re consuming data through Chronograf with. The middle part there is really about, what do you do with your data? What do you personally look at? And it’s usually kind of user-driven things. Once data has come through that section, you may end up in the Kapacitor section, where you can think of this as what is monitoring or making actions on your data. So, when my CPU threshold hits the 99th percentile, kick off to somebody and alert to somebody, or send something to PagerDuty, or send out a Slack alert or an email. Or spin up more containers, or spin up more pods, or—I don’t know, whatever it be. You can really use Kapacitor to kind of do these types of workloads. So, I would call it sort of monitoring and making actions or taking actions. So, that encompasses that top section there, the Telegraf, InfluxDB, Chronograf, and Kapacitor. That encompasses our open source offering.

Michael DeSa 07:01.586 So, that is the TICK Stack. The TICK Stack is 100% free to use, and every component there is 100% open source. So, the InfluxEnterprise is that bottom section that you see down there, which is clustering, security, and high availability. So, the way we kind of differentiate our closed source from our open source is, we really think that the things that are necessary to run the system in production are the things we would like to sell. So, that’s scale-out performance. That’s security. And that’s high availability. And so, those are—that bottom section there encompasses our closed-source offerings. And we’re going to go through each of these components in more detail just to give you a little bit more of a broad picture of what’s going on here. So, one question I get asked fairly frequently is, well, why TICK when you have, say, Elasticsearch and ELK Stack, or you have Graphite, or you have Prometheus and its ecosystem? Why would someone choose to use the TICK stack over any of these other options? And I think it’s a very good question. And I think it’s a question that we really try to focus on a lot here. At least, the answers are things that we focus on a lot here and we keep as sort of core tenets of our business.

Michael DeSa 08:26.174 So, the first one is, we wanted something that’s easy to use. And that means easy to operate, easy to get started with, and just kind of all around should not be hard to use. The query language should be familiar. Working with each of the components, setting up all of the components together, all of this should be very easy to do. So, ease of use and developer happiness are kind of our core tenets. And if we ever notice that we’re sort of straying in one direction or the other too much, we usually try to bring it back to this ease of use concept. The ease of getting started concept. Kind of in line with that is, we didn’t want some to have any external dependencies. There’s many sorts of systems out there where you’ll either need to run an HBase or a Cassandra on top of the TSDB that you’re working with. The Time Series Database, that is, TSDB. And we didn’t want something that was that. We wanted a simple binary that you can just kind of run anywhere on any system. So, Linux, ARN platforms, or on OSX, Windows, whatever you want. A 32-bit, 64-bit. Whatever you wanted to run it on, we wanted something that would work everywhere. And we wanted it to have no dependencies. So, that’s kind of—if you want to run it on a very sort of small platform and have Telegraf running on a bunch of ARN things, or even InfluxDB running near the edge of your system, you could do so without requiring a lot of resources.

Michael DeSa 10:05.870 We also wanted to allow for both metrics and events. This is something I mentioned earlier. So, that comes down to regular and irregular time series. And the reason for this is, there are many systems out there that focus predominantly on regular or predominantly on irregular. So, Prometheus, for example, you can put irregular data into Prometheus. It’s just a little bit more painful and not necessarily the easiest thing to do, in my opinion. You can do it, but it’s really intended for regular data. The next point below is, we wanted something that was horizontally scalable. So, scale-out performance as well as available in some highly available fashion. And then, the final point here is, it’s not just a single tool. The TICK Stack is not just a single piece of software. It is multiple pieces of software. It’s a collector. It’s a database. It’s the visualization component, as well as sort of an action or alerting component, or a processing and ETL component. And the reason why we think that having all of these components is better than just making any one of them is, we think that if you control each of these components, you can really leverage them to provide a lot better of an out-of-the-box experience than you may have with some other products. So, if you want to get up and running in a couple minutes and have the baseline of what you would want out of a monitoring system, you can do so with the TICK Stack very easily.

Michael DeSa 11:49.126 So, we’re going to move on from here to just talking about each of the components in a bit more detail. We’re going to start with the T, which is Telegraf. So Telegraf is a sort of plug-in driven server agent for collecting and reporting metrics. These metrics can be both regular and irregular. It can do things like pull from third-party APIs, or it can pull from different services, or it can even pull the system that it’s currently running on to get some metrics. And so, there’s two ways that we really see people using Telegraf, and that is to monitor all of the servers in their sort of environment, as well as the various kind of services, like their Postgres database, their Mongo database, their Elasticsearch, any of these things. You can really sort of have Telegraf monitor each of them. And the way that people typically configure this is, they have Telegraf run on all of their servers. And then on a separate server, they have one that maybe monitors all of their databases and sort of other tools that they’d like to have monitored. It has a very minimal memory footprint. So, it doesn’t require almost any memory. It’s written entirely in Go and compiles into a single binary without any dependencies. And so, the way we kind of see Telegraf progressing is, we want to allow people to more easily write plug-ins. And so we’ve done a little bit of work on a plugin architecture. We’ve also done a little bit of work on doing dynamic configuration of Telegraf, so that if you, say, decide that you want to start monitoring your system in some other way, you can do so via hitting an API rather than having to redeploy Telegraf or re-update the configuration on all of your Telegrafs and then restarting the process. So, to make it a little bit easier, that’s kind of where we see Telegraf progressing.

Michael DeSa 14:02.709 The next part in the TICK stack—we’re going to hop over the “I” and go straight to the “C”—is Chronograf. And Chronograf is our visualization tool, and data exploration tool, and dashboarding tool. So, if you need to do sort of administrative capabilities like create databases, users, make alerts, any of these things, set up roles for users, monitor certain types of queries that are going on, kill things that are running too long, you can do all of this with Chronograf. So, that’s mostly kind of the administrative capabilities of Chronograf. If you want to explore your data and just kind of do ad hoc exploration, or write queries and see what data comes back, you can do that exact thing with Chronograf as well. And just so you can get a feel, so you don’t have to hop into a CLI and start sort of writing queries by hand, you can use Chronograf to kind of help guide you in that way a bit more. And say you get to a query that you really like, you can very easily kind of export this query to a dashboard and kind of have it persist indefinitely.

Michael DeSa 15:15.336 And so, on top of that, since we know that most of the users that are using Chronograf are also using Telegraf, if we can detect that there are any hosts that are currently using Telegraf, we will generate sort of prebuilt dashboards for you and display them on the host. So, if you’re using—we can see you’re using Docker. We can you’re using Telegraf. We can infer the types of things you might be doing and the types of graphs that you might be interested in, and just have kind of auto dashboards there for you. So, there’s no fiddling around with an empty dashboard and sort of clicking around. There’s no going to the Grafana store and figuring out that this is the type of metrics that I want, and it’s for the Telegraf. It kind of just works out of the box. There’s no thinking about it. It just does what you expect it to. And so, where we see Chronograf kind of evolving to is even more of this kind of UI layer for the TICK stack. So, we see it as a place that you’ll build and manage TICK scripts. We’ll talk about what TICK scripts are in just a moment. What we see as a place where you can do all your user management and any new features that come into the TICK stack or into InfluxDB or Kapacitor or Telegraf, we really see Chronograf as the place where you will consume those new features. And that’s kind of how we see it progressing.

Michael DeSa 16:45.164 Next, we have InfluxDB. As I’m sure most of you are familiar, we kind of get the majority of our users coming from InfluxDB. So, they start with InfluxDB and they kind of move on to the rest of the pieces, once they have a good experience. And InfluxDB is a custom, high-performance data store specifically designed for time series data. As many of you, if you’ve been with us since before the 1.0 or 0.13 versions of InfluxDB, we rewrote our storage engine to specifically work with time series data and work well with time. It’s a variant of an LSM tree that we kind of modified to work specifically for time. And the reason why we—it’s usually not a great idea to invest a bunch of time into developing a storage engine. I think there’s a famous quote out there that says, “It takes between 10 and 15 years for a storage engine to really take.” Gotten rid of all the bugs. But with that in mind, we really thought that, and we saw that we were needing this new storage unit to allow for types of work flows we wanted our users to be able to do. And so, for that reason, we dedicated a decent amount of time to developing our own storage unit. It’s written entirely in Go, and compiles to a single binary. And no external dependencies, so if you’re running on ARM or you’re running on Linux/x86, any of these, or if you’re running on Windows, you can really run the power to InfluxDB anywhere. So, if you want to run in a very small, low-power device, you can do that as well as a massive server. It’ll work on both of them.

Michael DeSa 18:48.499 We wanted to have an SQL-like query language. It’s a very common thing to do in the database world, but we wanted to have it so that it was very easy for our users to get started with. That being said, we do hit some roadblocks in the limitations of SQL, particularly for time series, but it is something that we find users appreciate just because it’s a very easy place to get it started. We’ll offer tagging of data, and unlike some other TSEDs out there, we allow for arbitrarily many tags. At the moment, these tags currently live in an index in memory, but there’s some more going on right now. It’s called TSI, if you’re interested. Just a time series index that will be a disk-based index for tags, which will unleash users to allow for many millions or billions of individual series. At the moment, there’s a little bit of a hard restriction in kind of the low millions. And we’ll talk about what kind of performance you can expect from InfluxDB in just a moment. We wanted something that had retention policies, so you could sort of automatically expire state-held data, and we wanted continuous queries so you could roll up data or pre-compute any queries that you may be querying more frequently than you anticipated.

Michael DeSa 20:12.298 Just to give you an idea of what kind of performance—or how we think about performance here, we kind of think about these three categories: low, moderate, and high. We would consider sort of a low load would be if you’re doing less than 100,000 writes a second, you have less than 25 concurrent queries and less than 100,000 series. So, the way you can kind of think about this is how many Telegrafs would this be. And roughly, it would be about, I want to say, somewhere in the range of 100 to 500 Telegrafs would be what we consider in this low category here, with associate dashboards that are queried by multiple users. This is kind of the way we think about it. For moderate load, we’d be starting to get into things where it’s greater than 100,000 writes a second, and by writes, I mean individual lines spread across. It’d be probably batched into multiple write requests. We also think greater than 25 queries per second would probably be a little bit more in the moderate section, and then more than 100,000 series. So, that puts it somewhere in the 500 to 2,000, 3,000 Telegrafs reporting to InfluxDB. And we start to get into the higher load section, where we’re talking about greater than 500,000 writes a second, greater than 50 concurrent queries, and greater than one million cardinality or individual series. And keep in mind that these are kind of the current values. With TSI coming out, the cardinality we expect to kind of increase dramatically. We’ve been doing testing with TSI where there’s on the order of tens or hundreds of millions of series in a single InfluxDB instance. And we really view this as kind of the catalyst for monitoring containers in the way that you really want to.

Michael DeSa 22:22.363 So, at the moment, it’s a little bit hard to index on container ID, which is something that is very common for our users to want to do. And we’re working towards getting there. And I’d imagine that in the next release or so, you should see that coming out. It’s currently in beta. So, if you want to do some testing on it, we welcome you to do so. The next question that I get all the time is, well, what kind of hardware should I run InfluxDB on? And it’s a little bit hard. It depends a lot on your use case, but we can give you some general guidelines. Low load, you shouldn’t need more than four cores, four to eight gigs of RAM should be sufficient, and a disk that has about 500 IOPS. The question I get after that all the time is, “Well, how much disk do I need?” That’s a little bit hard to give you the specific answer there, obviously. But you end up using about between 2.5 and 3.5 bytes per point. So, kind of think about how points do I have, can get a little bit of an idea of how much disk you’ll end up needing, to give you a little bit better of an idea. So, every billion points you intend to write, the database ends up consuming about 3.5 gigs on disk after compression. So, figure out how many billion— how long it takes to write a billion points, and extrapolate from there.

Michael DeSa 23:48.870 For a moderate workload, we expect a little bit more than four cores. Ideally, eight, but six would be sufficient. And again—whoops, a little bit of a typo there. You’re going to want 8 to 16 gigs of RAM, not 500 IOPS as it’s written there. My apologies for that. And you want a disk with between 500 and 1,000 IOPS, depending on how much you’re doing. And then, for a high workload, you’re going to want eight cores, not eight per second. My apologies again for the typo. And you’re going to want between 16 and 32 gigs of RAM. Obviously, if you have a more complex workload, you can scale any one of these dimensions. This is just giving you a high-level overview. And you want about 1,000 IOPS on that disk. So, again, this is kind of just high-level things. If you want to know more detail, I suggest that you reach out to our team here or reach out on our community website, and we can help guide you a bit more. These are just kind of rough ballpark numbers. And I usually don’t like to give these kinds of rough ballpark numbers, but people usually find them helpful. And so, that’s why we’ve gotten them up here.

Michael DeSa 25:08.058 So, now that we’ve talked about InfluxDB, we kind of move on into Kapacitor. Kapacitor is really the open-source data processing tool. So, if you want to process time series data, so you want to create alerts or you want to do kind of anomaly detection or you want to transform your data and do some BTL, detecting anomalies, any of these things or all of these things can be done in Kapacitor. So, it can operate in one of two ways. It can either process streams of data. A stream of data just means that any write that InfluxDB sees will be mirrored to Kapacitor. Or you can have it work in kind of the opposite direction, where Kapacitor will be polling InfluxDB. And that would be a batched job. So, streaming data gets written to Kapacitor, whereas in batch, Kapacitor pulls data from InfluxDB. So, it can perform transformations on that data. It technically has a loop back feature now, so I believe it is technically Turing complete, though you do have to do some back-bending to make it work the way you intended it to, but I have done it for fun. And you can sort the data that you transform back into InfluxDB. If you ever find that there’s something that’s missing from Kapacitor or some feature that you really wish was there, we allow for what are called user-defined functions. So, these are basically just programs that you write yourself that interact with Kapacitor over some defined protocol. You can look up what that is. We do it over Unix sockets, and I believe over network. I could be wrong on that, though. So, you can kind of write your own custom logic and have it interact with Kapacitor data. And at the moment, we support Python 2 and 3 and Go. I know of a number of other ones have been written in the community, but have yet to be merged back into the core code base.

Michael DeSa 27:23.504 And on top of that, Kapacitor integrates with any of the kind of external services or alert services that you may want to do. HipChat, OpsGenie, Alerta, PagerDuty, Slack, email. The list is quite extensive. And if you ever find that there’s something that isn’t on there that you wish it was, you can reach out to us or open up a feature request on our repository. Send us an email, open something on the community. We’d be more than happy to answer. So, the way we kind of think about load in Kapacitor is a little bit different than how we think about it in InfluxDB. But we’re going to do some categories for that as well. Jobs or units of work are defined in Kapacitor by what is called a task. And tasks are basically kind of just individual things defined by a TICK script. We would consider you to have low load if you have less than 10 tasks and less than 100,000 writes a second that are going into Kapacitor, and your cardinality is less than 100,000. That would be what we would consider to be low load. Your load starts to get a little bit more moderate if you’re less than 100 but greater than 10. It’s definitely a little bit more than low, so we kind of put things in that space. You’re also going to be doing greater than 100,000 writes a second. And then the cardinality is going to start creeping up there a little bit more, maybe into the couple hundreds of thousands. And the reason why that kind of makes an impact, and makes a little bit more moderate than low, is that Kapacitor operates entirely in memory. And so, if you have 100,000 things, there’s going to be 100,000 buckets in your Kapacitor take script, essentially, kind of where you think about that. And then we consider you to be in high load if you’re doing more than 100 tasks, greater than 300,000 writes a second, and greater than 1 million cardinality. And so, determining which of these categories you fall into can be a little bit of a chore, and so you may find that you end up scaling up, starting with your Kapacitor instance being slow, and then kind of scaling up as you become more and more familiar and intimate with your system.

Michael DeSa 29:59.959 So, now that we’ve talked about the TICK stack, those are all of our open source offerings. You may be interested to know what is InfluxEnterprise and what is InfluxCloud. So, InfluxEnterprise is really our clustering capabilities. So, the clustering comes for both Kapacitor and InfluxDB, and that can be either highly available or scale-out. So, if you need to just do a million tasks, you can scale that out with Kapacitor. If you need to have a million series, you can do that by scaling out InfluxDB as well as operating these things in a highly available fashion. So, those are the two ways. That’s really the core thing of what we sell. On top of that, you can start to get things like enhanced security, so role-based access control, restricting certain users by tags, any of the things like this. And we’ve put a bunch of sort of testing into making sure that these things operate at scale and under load. And we’ll help out a bit more with your production and deployment so we can help you design your system. We’ll help you kind of walk through the process of setting things up, and give you any kind of gotchas or issues that we’ve noticed people come across.

Michael DeSa 31:32.593 Just to give you—to sort of break things down into categories here. Our product offerings are InfluxCloud, which is, it’s got its open-source core, it’s extensible, allows for regular and irregular, it is clustered, it is highly available or scalable, whichever one you’d prefer to do. We have backup and restore procedures that you can use, and the note there is for advanced. To give you an idea of what types of backup and restore processes are available, with InfluxDB, there’s a very simple. You can back up basically everything in a particular database, or the instance entirely. When you start getting into InfluxEnterprise, you get more advanced backups and store procedures like backing up from a time period, from a specific shard, from a specific retention policy, restoring into a new database or storing into a different retention policy. You can really kind of move data around however you please. Our InfluxCloud runs on AWS, and we will manage the entire instance for you, and you’ll get 24/7 support, I believe. And that’s kind of the way you can think about that. InfluxEnterprise is for our users that want all of the features of InfluxCloud but would really like to run it predominantly on their own servers, so on-premise. They want to run it behind their own firewall, in their own network. So, for those types of people, we would recommend using InfluxEnterprise. And if you’re just getting started and you want to just play around for a little bit, try things out, get your feet wet, you can go to either InfluxCloud or to the open source TICK stack. That’s kind of the way you can think about how our products break down.

Michael DeSa 33:32.527 So, from here, I’m actually going to do a short demo of our various tools. And to do that, I’m going to start sharing my screen. All right. Hopefully, everybody will be able to see my screen. I’m going to use this InfluxData Sandbox. Which is just kind of a sandbox that you can use to spin up the entire TICK stack very easily. And so, let me make sure these things are a little bit bigger so everybody can see. All right. And I’m going to move things around. And there we go. So, if you go to the sandbox, it requires Docker and Docker Compose. Essentially what it’s going to do is just spin up all of the components. You just change into that directory and then run .sandboxup. And when I do that, we’re going to see—it’s going to spin up InfluxDB, Telegraf, Kapacitor, Chronograf and some documentation. And it takes me directly to this page. I’ve actually been here one time before, and so I had a little bit of this set up for me already. Here we can see—oh, there we go. Let’s not do—let’s do the past five minutes. So, now we’ve spun up all of our containers and we’ve spun up all of our services. And we can start to get a feel for what Chronograf is like. So, this right here is my host list view of Chronograf. In this case, I only have a single host. But I can see the various apps that are running on my host. So, these are my Docker metrics. I can do my InfluxDB metrics, as well as my system metrics. So, these are all things that I can sort of look at my instance right off the bat. Since we’ve detected that all of these things are on our host, we can click on the host and just get a full dashboard of all of them. And it’s doing the past hour, but let’s do past five minutes so we can actually see stuff. And we can see exactly all of our metrics here. We can see how many Docker containers are running. And we can see how many Docker images there are.

Michael DeSa 35:56.890 If I click over to data explorer, I can start to build my functions here. So, I’m going to do usage user—oops. Can you still hear me? I think you can. Yes, you can. Awesome. So, let’s do something more interesting than what we’re currently doing. Docker memory, let’s get rid of Docker memory. So, here I’ve written a query where we’re looking at regrouped by—regrouped by CPO and we’re going to look at usage user. So, here, we’re seeing that there was a big spike. And then kind of just gradually going up. And now we’ve got some more stuff going on here as well. So, this is where you can start to view your data and get a feel for it. If, instead of having this graph, if I want to just view the raw results, I can do so by clicking on this table view over here. And this just gives me an idea of what my instance looks like. If I’d like to make a new dashboard, I can do so by clicking here. And then I can do stuff like edit the graph, and we’ll say I had a query. Click here. And we’re going to monitor InfluxDB CPU total, and we’re going to look at the usage. [inaudible] some place more interesting, the usage idle. And then I can make that. I can resize the plot however I’d like it to be. And then, if I want to, I can rename this to be CPU usage. And then I’ve got my nice little graph here. If I wanted to make more of them, I could do so. I can change the time frame to whatever I please, as well as the refresh rate. If I want to rename the dashboard CPU Dashboard, call it CPU Dash, I can do so as well. Now, if I come to dashboards, I can look at my CPU dashboard and click on the ones that I find necessary.

Michael DeSa 38:28.644 Here, I’ve gone ahead and set up a Kapacitor instance as well. I can define rules in Kapacitor. So rules end up just being tasks that will end up triggering alerts. So, I can do something to the effect of, let’s do system and let’s go to load 15. And let’s do group by host. In this case, we only have one. So, we’re going to do—if load one ever stops reporting for some reason, we want to trigger some kind of alert. And to do that, we’re going to say, if data is missing for 10 minutes, we know that there’s a problem. So, if this host has not reported any system load 1 metrics for 10 minutes, we want to kick off a dead man alert. And we can do stuff like notify the specific user of the specific details of the task. So, let’s just do in the space, /local. Local host 5555/alert. And we’re going to do ID, name, task, group by, sort of plug it by, and that’s going to be the body of our alert message. So, we didn’t title it, so let’s give it a title. And we’re going to do System Load One Dead Man. And then we can save. And now, if I go back to my rules, I can see system load. It’s dead signal. I can see what type it is. From here, I can toggle whether or not it’s enabled or disabled. And I can see that it’s going to trigger an alert to HTTP.

Michael DeSa 40:21.657 So, this is all great. There’s a couple other things you can do with the Chronograf. You can do database management. So, if I want to add a new retention policy, My RP, and it should be one day, I can do that there. If I decide I want to delete it, I can also delete it. If I’d like to delete the database, I type delete in here. Delete Chronograf. And it will delete it. If I’d like to delete the Telegraf database—in this case, I’m not going to do that. If I want to create a new database, you’d say My DB. Then I can add a new retention policy called My RP. In this case, it’s My TRP. And this can last for one day. So, here we have just made a database, commanded the various databases. I can create new users. We’ll say Michael, and the password will be password. And I can select what permissions I’d like to give it. With the Enterprise version, the types of permissions that are available to you are more fine-grained than they are in the open source, so you can give specific read and write privileges to certain tags or databases or measurements or fields, even. That’s all available in the Enterprise version.

Michael DeSa 41:58.361 And then finally here, we can see some queries. So, let’s do something where we’ll be able to hopefully see more queries. So. I can kill a query with a blank. I’ll get rid of all these queries here. As we can see, all of the other queries that are going through right now are very responsive and therefore are not taking up a whole lot of resources and exiting very, very quickly. So, we don’t even see them come up in this list. And I think that’s it. So, if you wanted to add a new InfluxDB source, you can do so here as well, come in and type the address. Influx two, and you can see that you can add a Kapacitor. Kapacitor two. You can connect to any of the various endpoints or external services that you have, so if it’s SMTP, Slack, VictorOps, Telegram, OpsGenie, PagerDuty, HipChat, Sensus, Talk, if you have any of these you connect to them through this Chronograf Kapacitor UI. And that will sort of close off everything that I have for the demo, and at this point, I will start taking questions. So, if you can just post your questions in either the chat or into the Q&A, I’d love to get a chance to answer any of your questions. Thank you again for your time today, and yeah, I’ll see you in the chat.

Chris Churilo 43:52.174 Awesome. Hey, Michael, we have a couple questions that are already in the chat. I’m going to get started. Hopefully, you can see them.

Michael DeSa 43:59.387 Yes, let’s do that right now. All right, Chris Mateen at CapitalOne has been working on the open-source version of TICK. Is it possible to get a technical PLC to answer some questions? Yes. So, that was great. Just want to answer some questions. What kind of transformations can be done in Telegraf, is the first question that I have. There’s various different types, they’re somewhat minimal, and the reason for that is we don’t want Telegraf to become something like Hecka, if you’re familiar with that, where it gets feature-blown. So, the types of transformations that can take place in Telegraf are: you can annotate names of either measurements, tags, or fields, or you can do simple aggregations of fields over some time period. I’d recommend that you check out the Telegraf documentation, it’ll be able to do far more in-depth version or in-depth explanation than I can right now. That would probably also be a good topic to bring to a webinar or to ask for in a webinar, is just some advanced Telegraf features, particularly around transformations.

Michael DeSa 45:26.989 The next question is, how does the continuous query work with later-arriving or out of order data? So, it’s a great question. So, we haven’t seen what the format of continuous query is, but there’s a re-sample clause that specifies how far back in time you would like the continuous query to run for. So, typically, you’ll have something where you run the continuous query every minute for the last five minutes’ worth of data, and group by time one minute. And what that’ll do is, any data that came in either out of order or late will be rewritten when the continuous query runs at a later time. It’s a very common thing that comes up. Another thing that you could do is, with Kapacitor, if Kapacitor can detect that data—out of order data doesn’t matter for Kapacitor. Within a time bucket or within a bucket, it’ll do any of the necessary ordering for you. I think that should be a sufficient answer.

Michael DeSa 46:39.075 Can we explain a bit more about backup and restore of InfluxData? Yes. So, there’s a number of different ways that you can do backup and restore. For the most part, the only place this really comes into play is for InfluxDB. Telegraf is completely controllable via a config file, and you can do a backup of Kapacitor by just capturing the database. It’s a very minimal database associated with the Kapacitor instance. So, if you just back up that file, it’s more than sufficient. So, for InfluxData or InfluxDB Open-Source, there’s a backup and restore procedure. It comes along with the InfluxDB binary, and you can backup an entire database, or an entire retention policy, or the entire instance, which isn’t necessarily always the best thing, but it does give you the capability. If you’re using Enterprise, there’s more complex backup schemes you can do. You can back up specific charts, or you can back up specific time ranges, so you can do a rolling backup, so to speak, so you don’t backup the entire database every time you run its backup. Alternatively, if you’re on Open Source, you can just do machine-level snapshots, which should be more than sufficient. InfluxEnterprise also has the ability to do more complicated types of restores, so if you want to change the name of the retention policy when you need to restore, you want to change the timeline of the retention policy, you can do all of that with InfluxEnterprise restore.

Michael DeSa 48:24.274 Did you say the two are mutually exclusive, horizontal scalability versus high availability? They are not mutually exclusive. Within a particular database, the database being the logical database inside of InfluxDB, the concept, that is true. So, you will either have a high replication factor, or you will have a low—which will be highly available and resilient, fault-tolerant, or you can have horizontal scalability and not be fault-tolerant. So, if you lower the replication factor, so I only have one copy of data in my cluster, if my data goes down, I’m still technically highly available, but I’m going to lose a lot of—I’m not going to be fault-tolerant, so I’m going to lose a decent amount of data if I have a low replication factor and I lose more nodes than I can really deal with. So, if I have two-node cluster with a replication factor of one, and one of my nodes goes down, half my data is gone. So, you’re still technically available in that case, yeah, you’re still available, but you’re not resilient to fault tolerance. That may have been a better way to say that. So, they’re not mutually exclusive. So, there’s a couple of other questions in the Q&A.


Chris Churilo 50:18.521 No, I think that’s it right now. If there are any other questions, we still have a few minutes. We’ll keep the lines open for everybody. I want to thank Michael for a great presentation like always, and if you have further questions at the end of this webinar, don’t forget to go to our community site, where Michael and Jack and a whole bunch of our technical team members are always there to answer any of the questions that you have.

Pin It on Pinterest

Contact Sales