Coming soon! Our webinar just ended. Check back soon to watch the video.
Webinar Date: 2018-12-06 08:00:00 (Pacific Time)
In this session you’ll get a detailed overview of Kapacitor, InfluxDB’s native data processing engine. We’ll cover how to install, configure and build custom TICKscripts to enable alerting and anomaly detection.
Watch the webinar “Intro to Kapacitor for Alerting and Anomaly Detection” by filling out the form and clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “Intro to Kapacitor for Alerting and Anomaly Detection.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Chris Churilo: Director Product Marketing, InfluxData
• Margo Schaedel: Developer Advocate, InfluxData
Chris Churilo 00:00:06.829 Okay. As promised, it’s three minutes after the hour so let me—oops, hang on. Somebody’s raised their hand. Hello Franco. If you have any questions, just type them in the chat. And I can definitely help you out there. Okay. I think everyone’s all set. All right. Hello everybody. Okay. So Franco asked, “Is the recording going to be online?” Yes. Okay. So thanks for joining everybody. My name is Chris Churilo. And we conduct these trainings every Thursday with a variety of topics related to InfluxDB. And today’s training is all about an introduction to Kapacitor. The session is being recorded. And I will be posting it later on today. And the email will get sent tomorrow so that you can have the link. But it will be the same link that you used to register for the webinar so pretty easy to get to. And if you have any questions during the session at any time, please feel free to type your questions in the Q&A or in the chat panel either one will work. And if you are interested in actually talking to Margo, just raise your hand and I can unmute you. And you can your questions that way so whatever makes it easier for you. I want to also remind everybody that if you have questions after the webinar, you can either email them to me and I’ll forward it Margo or the best place really is to post your questions in the community site. And that way, it’s not just Margo and I answering those questions. We get the rest of the team answering the questions for you. In addition, if there are any questions that Margo and I cannot answer for whatever reason today, I generally will post those in the community site. So that everybody on today’s session can go to community site and see the answer that it’s generated by one of our esteemed developers. So with that, I’m going to pass the mic over to Margo. And let her introduce herself and get this training going.
Margo Schaedel 00:02:04.652 Okay. Thanks Chris. So welcome everyone. Good morning. As Chris said, I’m Margo. And I’m a developer advocate here at InfluxData. And this is part of the getting started series. Today we’ll be talking about an introduction to Kapacitor and TICKscript. So let’s take a look at the agenda for today. So throughout the webinar we’ll be talking about what is Kapacitor. We’ll go into a little deeper dive on that. We’ll cover how to install Kapacitor. Excuse me. We will look at TICKscript and try to understand that specifically the syntax and how you can write it. And we’ll be taking a look at stream tasks and batch tasks which are very important in Kapacitor. We’ll look at how to trigger alerts. We’ll as I said, look at batch versus stream and understand the different use cases for both of them. We’ll also look at Kapacitor as a downsampling engine for when you’re trying to cut back on all your time series data and as an alternative method for continuous queries. So we will be covering a lot today so please free to ask me any questions at any time. And we can stop and help you out.
Margo Schaedel 00:03:36.939 Okay. So before we jump into just Kapacitor, let’s just briefly go over InfluxData’s functional architecture. The Influx platform has four different components. So Telegraf, that’s our agent for collecting and reporting metrics and events. It’s a plugin driven. You can just drop it in. And then, start sending data over on to InfluxDB. InfluxDB is our purpose-built Time Series Database. So it’s optimized for high-volume data ingestion and output. And it also uses very little disk space and it has a lot of features that are built in which are really great for handling your time series data. Chronograf, that’s our visualization interface for the InfluxDB platform. So here you can create dashboards and see your data looks over time. And then, Kapacitor which is what we’ll be covering today. This is our real-time streaming data processing engine. So in Kapacitor, we can trigger alerts on our data, we can down sample it, transform it. We can do a lot of things. And then, we can also send it back in the InfluxDB afterwards.
Margo Schaedel 00:05:03.299 So just a couple areas where you might want to consider using this type of platform. We see a lot people using it in the IoT space for collecting sensor data. It’s really big in DevOps and monitoring scene. And we also see it quite a bit in real-time analytics or for collecting application and performance metrics.
Margo Schaedel 00:05:31.761 All right. So as I said before, some of the key Kapacitor capabilities. With Kapacitor, you can set thresholds to trigger alerts when your data goes over a certain point. And then, you can be alerted immediately. So that’s very handy especially in the DevOps team. And then, you can also process ETL jobs. So you can extract the data from the database. You can manipulate it and transform it. And then, you can load it back into the same database or another database. So that’s really great for things like downsampling when you want to cut back on older data.
Margo Schaedel 00:06:22.181 Okay. So just to go into Kapacitor and a little more depth. So Kapacitor is our real-time streaming data processing engine. So as I just said, it’s really great for ETL jobs where you can grab that data, manipulate it, and transform it in some way. And then, push it back into the database. So we can process the data both as a stream task and batch task from InfluxDB. And we will talk about that a little bit later on about when to use which type. You can also put in your own custom logic or user defined functions to trigger alerts with dynamic thresholds. Excuse me. You can match metrics for patterns. You can look into computing statistical anomalies. So that’s great for being able to see what’s going in your data. And you can perform specific actions based on these triggered alerts. For example, dynamic load rebalancing. That’s a big one. As I said downsampling, that’s really key with time series data. And we’ll talk a little bit about that later on. And then for alerting, you can have those alerts get logged or you can have them sent over via Slack, HipChat, PagerDuty, OpsGenie, just regular emails. So there’s a number of mediums that you can use to receive those alerts when something is triggered.
Margo Schaedel 00:08:08.770 And just one more thing, here you’ll notice in the diagram. You can actually pull data directly from Telegraf. So you can have Telegraf send data to Kapacitor first. And then, have it transformed in some way. And then, send it on to InfluxDB or another database. So that’s really cool that you could do it from Telegraf or you could pull it from InfluxDB. It’s pretty versatile in that sense.
Margo Schaedel 00:08:46.030 Okay. So how do we install and start Kapacitor? So all the different components are pretty similar in the installation process. You can find the available builds at this link up here on the top, portal.influxdata.com/downloads. And here, these instructions are showing an Ubuntu and Debian OS instructions. But you can find instructions for other builds. I use HomeBrew because I have a Mac so I just install it that way. You can also pull a docker image if you want to run it in a container. And there’s also Windows and Linux versions available at that link as well. So it’s really simple, and pretty easy to get installed, and up and running. And all the components—you can install them all together or you can install them all separately. Yeah. It’s pretty easy. So yeah. So check out that link. You can get up and running very, very quickly.
Margo Schaedel 00:09:57.383 Okay. So let’s start talking about how to define tasks in Kapacitor. So here we’ll be learning about how to write our own tasks in Kapacitor using TICKscript which is the scripting language that Kapacitor uses. So before we do that, let’s talk a little bit briefly about when you’re writing a TICKscript, you have to define your task as either a batch task or a stream task. And they’re a little different. So they do have advantages and disadvantages based on what you’re trying to do. So if we talk about stream processing first, for these data points are read in a classic data stream. They’re read point by point as they arrive. So for this Kapacitor would subscribe to all rights of interests in InfluxDB. And then, you could potentially manipulate that data. And then, send it on to InfluxDB that way. Usually, the syntax for this as we’ll see later will always be stream and then, the node from. So that’s kind of the data flow relationship. You have to find out where is the data coming from. And with batch processing, this is a little different in that we batch the data. We have windows of data. So you would take a frame of historic data that would be read from the database and then processed. So here, you have to actually query InfluxDB to get that window of data. And then, perform some kind of transformation on it and send it onwards. So you’ll see when we write a batch task, the syntax is always going to be batch and then the node query. So that’s a little bit how the data flow relationship works for that.
Margo Schaedel 00:12:03.650 Now how do we decide which one to use. Well, that’s going to depend on your system resources, what kind of computation, or transformation you’re trying to achieve. For instance, if you’re working with an extremely large set of data over a very long time frame, you’re probably going to want to use batch. Because that’s going to leave the data stored on disk until you actually need it. Although when you actually query InfluxDB, that will result in a sudden high load on the database. but if you were to use stream processing for that, that would mean potentially holding on to billions of data points in your memory. So there are some pros and cons for that. However, if you’re working with smaller time frames, it’s pretty good idea to use stream processing as that’s going to lower the query load on InfluxDB. Okay.
Chris Churilo 00:13:07.536 And also Margo, you wrote a pretty good blog on that. And I put the link into the chat panel that also goes—
Margo Schaedel 00:13:14.124 Oh yeah, yeah. Sorry, I forgot to mention that. Yeah, yeah. I just had a blog on batch processing versus stream processing. And some of the different use cases around it. So I definitely encourage you to check that out. It goes into a little bit more detail I think there. Thanks Chris.
Margo Schaedel 00:13:33.576 Okay. So let’s take a look at TICKscript. So when we use Kapacitor, we have to use this domain specific language called TICKscript to define our tasks. So if we want to run any type of ETL job, downsampling, we have to define our tasks using TICKscript. And you’ll notice when you write at TICKscript, your filename should end in .tick so that you can define that. So it’s a chain invocation language. And this pipe symbol chains together different nodes. Now, what is a node? So in TICKscript, the fundamental type is the node. And a node can have properties and it also can have chaining methods. So a new node could be created from a parent or sibling node using a chaining method of that parent or sibling node. And then for each node type, the signature of this method would be the same regardless of the parent or sibling. The chaining method can accept zero or more arguments used to initialize the internal properties of that node instance. So some common node types you’ll see are batch, query, stream, from, eval and alert. And you can see some of those here in this example. There are obviously a lot more than the once I’ve just named. The top-level nodes which are stream and batch. So you’ll see that at the top up here: var data = stream. So you’re defining the processing type of the task. Those don’t take any arguments and they’re just simply declared. And then, you’ll see down here. Variables refer to values: strings, integers, floats, iterations and pipelines. So yeah. So a variable can reference literal values which are strings, integers, floats, and the node instance with methods that can be called.
Margo Schaedel 00:15:52.232 And then, just to talk a little bit more about methods before we continue on. There are property methods. So a property method would modify the internal properties of a node and returns a reference to that same node. And those property methods are called using the dot notations. So you can see that up there. The dot refers to specific attributes on the node. And then the chaining method, that would create a new child node and returns a reference to it. And those are called using a pipe notation.
Margo Schaedel 00:16:27.944 So each node type is going to want data in either batch or streaming mode. Some of them can do both. And then, they’re also each going to watch to provide data in batch or stream mode. And again, some of them can do both. So some of the use cases we see for that are if you want a batch, but you provide a stream, that could happen when you’re computing an average, or a min, or a max. However, if you want a batch and provide a batch, that’s seen commonly if you’re trying to identify outliers in a batch of data. If you want a stream and provide a batch, that’s often seen when you’re grouping together similar data points. And if you want a stream and provide a stream, you’ll see that one applying a mathematical function like a logarithm to evaluate the point. And then lastly before we move on, pipelines are chains of nodes that are logically organized along edges and it cannot cycle back to earlier nodes in the chain. So our nodes within a pipeline can be assigned to variables as I said earlier. And this allows for results of different pipelines to be combined. So you could use a join or a union node. This also allows for sections of the pipeline to be broken into reasonably understandable functional units. Excuse me. In a simple TICKscript, you might not even need to assign pipeline nodes to variables. And the initial node in the pipeline as I said sets your processing type. So that’s where you would set stream or batch. And you cannot combine those two types of pipelines. So just to remember, you have to choose between one or the other.
Margo Schaedel 00:18:26.452 All right. So let’s take a look at the TICKscript syntax. Some of the biggest issues we see people run into are around the quoting rules. So you’ll see here, when you use double quotes, that’s going to reference data in a lambda expression. So if you take a look down here where it says, “pipe where lambda is equivalent to true,” you’ll see that is_up and my_field are both in double quotes. That’s really important to remember because, otherwise, it will trigger an error. Single quotes we see use for literal string values. So if you take a look at var measurement = ‘requests’, that’s contained in single quotes. Because measurement is represented by requests. So here we’re using lambda expressions to define transformations on data points. And also to define boolean conditions that could act as filters. So we try to make it as similar to InfluxQL, the where clause. But probably, the differences are that again lambda expressions need to use double quotes. So the field identifiers and the tag identifiers, they need to use double quotes. You’ll see the comparison operator here. That uses two equal signs not just one. And then, all lambda expressions need to begin with the lambda colon keyword. Just before I go on, does anybody have any questions? I know that was a lot of information.
Chris Churilo 00:20:14.351 Not right now.
Margo Schaedel 00:20:15.587 Okay. Perfect. All right. So let’s take a look at a pretty simple example of a stream TICKscript. Okay. So this one here is basically logging all the data from the measurement CPU. So you’ll see at the top stream. That’s where you define what type of processing you’re going to do. And stream is always followed by from. So you have to say where is the data coming from. So here we’re using measurement. And then, we define what measurement we’re going to stream from CPU. And then, we follow that with another node log. So we just log all the data coming from measurement CPU. And it’s stream so it’s coming in point by point.
Margo Schaedel 00:21:14.164 Okay. So if you were to create a stream task, here if you look at the left side. That’s the actual TICKscript file that’s following the TICKscript syntax. On the right side, is showing you how you could write out your TICKscript in the terminal. So how you could define your task in the terminal. So they’re obviously a little bit different. On the left, stream is where you’re defining the type of processing from you’re telling where you’re going to grab the data from. So you’re grabbing measurement. Which measurement? CPU. And then, you’re logging that data as it comes in. On the right, here you’ll notice you have to first define your task. So Kapacitor defines CPU. You name the file cpu.tick. And then, you have to designate which type of processing task. So is it stream processing? Or is it batch processing? And then, DBRP is database retention policy. So you’re designating from which database we’ll be streaming the data from. So that’s Telegraf. And then, autogen is the default retention policy which you can change if you need to.
Margo Schaedel 00:22:46.947 Okay. So here on the right, you can see some of the output from that Kapacitor defined task. So you’ll see down towards the bottom, it’s created that task for you. So it says, “TICKscript.” And then, it gives you the filename CPU.tick. And it shows that same TICKscript that we saw in the previous slide where we had stream from measurement CPU. And then, log that data. So that’s just showing you what is a possible output from writing this simple TICKscript in your terminal.
Margo Schaedel 00:23:39.584 All right. Let’s make things a little more interesting than just pulling all the data from CPU and logging it. So here, we’ve added in a couple extra notes. So we still are streaming it. So it’s still stream task. And we’re grabbing that data from the CPU measurement. But we’re now adding in window. So when you add in the window node, you now need to designate the property of period. So how much time is in this window? So the window interval is five minutes. But then, it’s going to admit every one minute. So that every one minute is the frequency with which this is going to run. And then, the following node is mean. So that’s finding the average. And you’re finding the average of usage_user. Usage_user, that’s a field. And then, you’ll see the property .as so that’s kind of renaming the data from that. So once you calculated the usage_user average, we’re calling it mean_usage_user. And then after that, we are logging the result. So that’s a little bit more complex. But still pretty readable overall, I think.
Margo Schaedel 00:25:13.436 All right. Here we are adding even more. So we’re still running a stream test. We’re running it from the measurement cpu. And now we’re adding a where clause so you’ll see pipe where. And then, we’ll use a lambda clause. We use a lambda expression. And you can notice here, they use the double quotes. So cpu is in double quotes. We’ve got the two equal signs following the syntax. And then, cpu-total which is we’re looking for where the tag is cpu-total. And then, we’re still running the same as before. We’re setting a window a period of five minutes. It will admit every one minute. And then from there, we’re calculating the average usage_user. And we’re renaming it to mean_usage_user. And then, we’re logging that result. So basically, what we’ve added here is the filter based on the tag cpu is equivalent to cpu-total.
Margo Schaedel 00:26:25.832 Okay. So next up is adding alerts. So TICKscript is commonly used to write and define tasks for alerting. Here you can see how we can add that in. Because it’s not as fun just to simply log all of the results. So just to build on what we’ve already been talking about. We’re still running a stream processing task so the data is coming in point by point. We’re grabbing it from the cpu measurement. We’re filtering that cpu measurement based on the tag cpu-total. We’re creating the data as five-minute windows that are going to admit every minute. We’re grabbing that usage_user field. And we’re calculating the mean for that. And then, renaming it to mean_usage_user. And then all of the results of that we’re then only going to alert when it reaches the critical point of mean_usage_user is greater than 80. So that’s essentially where you’re setting a threshold and you’re saying, “Okay. Well, if the results that are coming out of the previous one are over 80, then we want to send a message.” And say, “cpu usage high.” So that’s going to be the message that’s triggered if this threshold is crossed. And then, Slack is the medium in which you can send out that message. So here you see Slack. And then the property channel, it’s going to go into the alerts channel. So if it crosses that threshold, you’ll see a Slack message saying, “cpu usage high!,” in the alerts channel. And then, you can add on multiple medium. So here you’ll also be getting an email. It will send out an email with that message to this email address. Okay. So that was some examples of stream processing tasks.
Margo Schaedel 00:28:49.698 Now let’s see what the syntax looks like if we create a batch TICKscript. So here you’ll see the first word define at the top. You have to define the type of process. So we call it batch. And then, it’s followed always by query because batch processing is always going to require querying InfluxDB. So here you’re going to query the database. And your query, it follows InfluxQL. You’re going to select the mean_usage_user. So you’re going to grab the average of the usage_user that field. And then, you’re going to rename it mean_usage_user. And then, you’re telling—you have to define where you’re grabbing this data from. So you’re grabbing it from Telegraf database with the retention policy autogen and CPU is the measurement. So if you’re familiar with InfluxDB and InfluxQL, you’ll be much more familiar with that because that follows the same syntax. And then below that, you’ll see it’s setting the interval, the window to be five minutes. But admitting every one minute. And then, it’s logging the results. So you do see some similarities between stream and batch. But probably, the biggest difference to remember is that batch has to be followed by a query because you need to query the database. Whereas, stream has to be followed by from so that you can define where you grabbing the measurement from.
Margo Schaedel 00:30:32.708 Okay. So here very similar, you can see the TICKscript syntax on the left side. That’s the same as the TICKscript we just viewed on the previous slide. On the right, you’ll see how you can create that in the terminal. So you must define that you want a batch CPU. So it’s going to create your batch_cpu.tick file. You define that it’s a batch processing type. And you define that you’re using Telegraf, autogen as your database and retention policy. And then from there, you can see. Did I skip one? No. Okay.
Margo Schaedel 00:31:18.031 So here you’ll see a little bit more complex. But you can still see some similarities to the stream processing tasks we looked at. So batch, query, you put in your database query. So it’s the same SELECT mean(“usage_user”) AS mean_usage_user. So you’re renaming that result. And then, you’re grabbing that from the database Telegraf, the default retention policy and the cpu measurement. You’re setting your window to be five minutes to admit every one minute. So that’s going to query every one minute. And then, you’re also setting an alert. So you can still set alerts with these types of tasks as well. And this is very similar to before. So when it reaches that critical threshold. So when mean_usage_user is greater than 80. And you’ll see here, because you’re still using a lambda expression, you need those double quotes around the tag name. So when it crosses that threshold, you’re going to create the message, “cpu usage high!”. And you will be sending that to Slack, to the alerts channel. And then, also sending it to that email that’s listed at the bottom. So that later part of it is very similar pretty much the same to the stream task.
Margo Schaedel 00:32:55.782 Okay. So just to reiterate because there’s definitely some differences between batching and streaming. So we did talk about it already. But I just wanted to give a couple examples where you might use batch tasks over streaming tasks. So we just saw that when you’re running a batch task, Kapacitor is querying InfluxDB periodically, right? We saw that we were creating the five-minute windows of data and querying the database every one minute. So with batch task, you avoid having to buffer much of your data in RAM. We most often see this when you want to perform aggregate functions. So if you want to say, “Find the mean, max or minimum of a set interval of data.” So if you have a set window of data that you want to perform functions on, it’s a good idea to use a batch task. Other cases are when alerting doesn’t need to run on every single data point. So sometimes you’ll notice that state changes don’t happen that often. So you don’t want to have to kind of get alerts for every single data point coming in. To use batch task for that case will because you’re setting windows of data, you’ll see the state changes less. So you won’t get inundated with alerts. You also want to use batch task when you’re downsampling. So because downsampling of your data takes a large collection of data points. And then, it only will retain the most significant data. And then, we’ll talk a little bit more about that later. Other cases when you might want to use batching are where a little extra latency doesn’t severely impact your operation. So there is a slight latency that happens with batch tasks versus streaming tasks. And then lastly, cases with super high throughput with a super high throughput InfluxDB instance, you’d want to use a batch task for those. Because Kapacitor cannot process data as quickly as it can be written to InfluxDB. This most often occurs with InfluxDB Enterprise clusters.
Margo Schaedel 00:35:20.584 Now for stream task. So because stream tasks are creating subscriptions to InfluxDB so that every data point written to Influx is also written to Kapacitor, that’s going to use up a lot of your available memory. So that’s really key factor to take into consideration. So that’s why you’re going to want to do this with small sets of data. So here’s where stream processing can be most ideal. If you want to transform each of your data points in real time. So if you want to avoid that very slight latency that happens with batch tasks, you should use streaming. So basically, cases where the lowest possible latency is so important to the operation. For example, if your alerts need to be triggered completely immediately, if you run a streaming task, that will ensure the least possible delay. Also, cases where InfluxDB is handling a high-volume query load. If you want to alleviate some of that query pressure from InfluxDB, Kapacitor can run a stream task that will help remove some of that pressure. And then one thing to notice, stream tasks understand time by the data’s timestamps. So there aren’t any [inaudible] conditions for when exactly a given point will make it in to a window or not. However with batch tasks, you’ll see that it’s possible for a data point to arrive late. And then, it could get left out of its relevant window. All right. So any questions there or about anything so far?
Chris Churilo 00:37:05.038 Yep. We do have some questions. So Viswashro asked, “Can we use Kapacitor as a rule engine in an IoT use case?” In particular, he’s talking about Telegraf collecting data from an MQTT broker. And then, sending it to Kapacitor.
Margo Schaedel 00:37:21.938 Sorry. Could you say that first part again? A rule engine.
Chris Churilo 00:37:25.346 Yeah. So if you go to the chat, you can see it. Sometimes it’s easier to read [laughter].
Margo Schaedel 00:37:28.702 Yeah. I know. That’s very true. Okay. Here we go. Can we use Kapacitor as a rule engine in an IoT use case? Telegraf getting data from a broker. And then, sending it to Kapacitor. Yeah. I mean, I think that’s perfectly possible. I’d have to like I guess check whether, I think, the MQTT broker should be able to send it over to Kapacitor. But I’m not completely sure about that. I don’t know if Chris you can—
Chris Churilo 00:38:09.240 I mean, you can always send it to InfluxData [crosstalk] or to the database. And leave it that way.
Margo Schaedel 00:38:12.106 –and then, yeah. And then, on to the Kapacitor from there. Yeah. Yeah. So you should definitely be able to do that. And then for scalability, the TICK Stack scales really well. We see a lot of people mostly using the open source and having no issues with it. So I mean, really people don’t start needing InfluxDB Enterprise until they start needing clustering and high availability. So for most of the people I’ve talked to, people have been really happy with the scalability. But I don’t know if Chris, you know any of the actual benchmarks or—
Chris Churilo 00:38:53.106 Yeah. So we do have a set of benchmarks. And benchmarks are always very biased even though we try not to be biased in our benchmarks. But we create them just so that people can take a quick look at them. They’re all open-sourced. You can even see how we actually conducted the benchmarks. But for the three things that Time Series Databases are known for which isn’t just for storage, so the compaction that we do, and then query. Those three areas we—any kind of Time Series Database is going to do a lot better than any standard database. And then, even when you compare us to other Time Series Database. For those three elements, we do really well. So we feel that it scales really well. But the best thing is just try it. It takes you just a couple of minutes to install and start putting in data into the TICK Stack. And you can see for yourself that it works quite well. And we have a lot of IoT use cases. There’s even some content on our website from various users as well as customers. So as Margo mentioned, majority of our users are just using open-source and some pretty impressive products are scaling quite well with us. All right. Let’s go for the next question, Margo.
Margo Schaedel 00:40:13.264 Okay. So the last one is while creating messages, can we add attributes, names, and values like in the case what was cpu usage at that time? So I think if I understand correctly, in the message for the alert, yes you can put in attributes, names and values. Are you talking about just within the alert message? Because I know that’s definitely possible. But I wasn’t sure if I understood that question correctly. Okay, cool. All right. Yeah. So you can definitely add that in to your message as well. Like if you want to give the actual level at that time. It’s definitely possible. Okay. Let’s soldier on. Back to present mode.
Margo Schaedel 00:41:11.434 All right. So let’s quickly talk about Kapacitor for downsampling. So you’ll very quickly notice that when you are dealing and managing time series data, it’s coming in at a very rate and high volume. So quickly there will be storage concerns. And also, some of it once it’s a little older, it’s not as relevant. So you don’t need to keep as high a precision of data passed a certain point. So to combat this, we use downsampling which is the process of reducing a sequence of points in a series to a single data point. So often you’ll see that you’ll take a window of historic data. And then, you’ll compute the average, the max, the min of that window of data. So you’ll just reduce by a huge number, the number of points in that window. You’ll take a huge set of points. And then, you’ll only get one point out of it. So that’s great for when you want to keep historic data. But you don’t want to keep it in that high precision. And so downsampling is possible with InfluxDB using continuous queries. But you can see that sometimes running continuous queries can add all this extra load and pressure on InfluxDB. So giving that over to Kapacitor to handle, can kind of alleviate some of that pressure. So there is actually a really good blog about this written by another dev advocate, Katy Farmer. She wrote a blog about whether to use continuous queries or Kapacitor for your downsampling purposes.
Margo Schaedel 00:43:08.396 So why should we downsample? Downsampling allows us to enables faster queries. And then, we can store less data so it will take up less disk space. And so as I said, if you’re running a lot of continuous queries in your InfluxDB instance, that can end up causing your instance to lock up, or it can fall behind schedule, it can degrade your dashboard query loads. So having a downsampling done in Kapacitor can help to free up your databases resources. Just as a side note here. For downsampling, a streaming task is going to be more performant while a batch task is going to still have to query InfluxDB and create that extra load. So as I said before, this is something where you really need to consider what are your system resources. How much data am I going to be dealing with when I want a downsample and so forth?
Margo Schaedel 00:44:12.285 When you’re downsampling, if you want to perform a complex data transformation, I would definitely recommend using Kapacitor to do that rather than a continuous query. However, if you’re only doing a limited number of queries like a very small amount of downsampling, then it’s still okay to use the InfluxDB continuous queries. Okay. So as I said, when we downsample using Kapacitor, this helps to alleviate the pressure on InfluxDB because we’re offloading our computation to a separate host. We can as we’ve seen create a task that can aggregate our data. And then, write it back into InfluxDB if we want or send it on elsewhere. And then depending on your particular use case, you would choose either a stream task or a batch task. It’s really important to consider this here because batch tasks will have to query InfluxDB. So that will add a little bit of pressure back unto the database. And then, you can write the data back into InfluxDB after you’ve downsampled it. And you typically would add a different retention policy for that instance.
Margo Schaedel 00:45:33.983 Okay. So we can look at an example real quick here. So this is an example of a batch task downsampling. So this does require querying InfluxDB. So as we’ve seen before we defined the task at the top batch. And then, that’s always followed by query. And you place your InfluxQL query inside of that. You can also see that the window is five minutes. So the data is going to be separated into five-minute intervals. And it’s going to be queried every five minutes. And then, this InfluxDBOut method. This is sending it back into InfluxDB specifically into the Telegraf database with the retention policy of five minutes. And then, it’s given a tag and the source is Kapacitor. So that you can tell, “Okay. Where did this data come from?” That can help to differentiate between your current data and then your historic data.
Margo Schaedel 00:46:43.434 Okay. So just a couple of things to kind of finish up today. Maybe just a little bit of a summary for batch tasks versus stream tasks. So this should all be reiterated. So batch tasks, as I said, they query InfluxDB periodically. That’s why they’re followed by the query note. But they do use limited memory. However, it can place additional query pressure on InfluxDB. So that’s something to definitely take into consideration. Batch tasks are also best for performing aggregate functions and for downsampling or if you want to process large temporal windows of data.
Margo Schaedel 00:47:31.803 Now stream tasks, on the other hand. Those are subscribing to all rights from InfluxDB which is placing an additional right load on Kapacitor. But back and help to reduce the load on InfluxDB. And in stream task cases, you want to use that where the lowest possible latency is really, really integral to the operation. So generally, it’s a good idea to make sure you consider this when you’re transforming your data, downsampling it. It’s really, really good to figure out, “Okay. Am I putting too much load on InfluxDB? How big are my datasets that I’m going to be processing? And what kind of latency can I expect?” So yeah. So those are definitely some things that you need to take into consideration when you’re defining your Kapacitor tasks.
Margo Schaedel 00:48:29.443 And all this I go through in that blog that I wrote. So if you have a minute, it’s a pretty quick blog. You can get through it pretty easily. Or if you want to take a look at whether to use Kapacitor or continuous queries, check out that blog by Katy Farmer. She goes into a lot of detail over the pros and cons of that. So yeah. So I think that’s just about sums up today’s webinar.
Chris Churilo 00:49:03.431 Excellent. A lot of great details here. So we’re going to keep the lines open for a few minutes. If you have any other questions, please feel free. And it doesn’t have to be limited to Kapacitor. If you have any questions with Telegraf or InfluxDB, we’re happy to answer those as well. And just want to remind everybody, that the recording will be posted later on today. And I’ll just pop in the link really quick so you can get it. Here we go. So it will be here. And you can see right not on the right-hand side, it says, “Coming soon.” So within just a couple of hours that will be switched to Watch Now. And you’ll be able to take another listen to what Margo discussed today and her overview of Kapacitor
Chris Churilo 00:49:50.546 So Margo, now that you’ve been at Influx for a few months. What’s your experience been with Kapacitor? Because I know it probably was a little bit daunting when you first came on.
Margo Schaedel 00:50:01.393 Definitely. Yeah. Yeah. I would say when I first came on, the first thing I was familiar with Chronograf because I spent a lot of time with visualization. So that was at least familiar. So I’m mostly focused on InfluxDB, and Telegraf, and Chronograf. Kapacitor was just kind of I was like, “Oh, I’ll get to that eventually.” But actually since I started researching a lot for that blog I wrote and hanging out in the community site, has helped to make things a lot easier. And from talking to a lot of the developers at InfluxData. It’s actually pretty, I wouldn’t say simple, but it’s a lot simpler than I thought it would be. I think I looked at the TICKscript and I thought, “Oh, that just looks so complicated.” But reading through documentation has really helped. The documentation on it is really clear. And I spoke with Michael Desa a few times. He’s great for asking any Kapacitor questions as well as Nathaniel. So yeah. They really helped to explain a lot of what was going on. But I also spent a lot of time just going through and trying to write out my own TICKscripts. And then, watching them break [laughter]. And kind of letting the error define how I would learn in the learning process. So yeah. So I think now I feel a lot more comfortable with it. And yeah. And I think I’m probably more comfortable with Kapacitor now than say writing continuous queries for InfluxDB. So it seems simpler to me to use. And then, I think Chronograf also makes it really nice. Because it has the Kapacitor interface where you write your TICKscript in Chronograf. That’s really nice. So it’s a lot more manageable now.
Chris Churilo 00:52:18.119 Awesome. And it’s really good to hear. So with that, everybody on the training today. Don’t be intimidated. Just go ahead and try it. And learn from your error. And another reminder is if you have any questions, we’d referred to the community site a number of times. The URL is in the chat panel. It’s really easy to get to. And Margo pretty much lives there so you’ll probably see her answering your questions. Looks like I don’t see any more questions. So with that, we’re going to end the webinar. And I want to thank Margo for doing a fabulous job. The recording will get posted later today. And we hope to see you in the community site. Thanks everybody.
Margo Schaedel 00:52:58.965 Thank you.
Chris Churilo 00:53:00.327 Bye Bye.
Track and graph your Aerospike node statistics as well as statistics for all of the configured namespaces.
Knowing how well your webserver is handling your traffic helps you build great experiences for your users. Collect server statistics to maintain exceptional performance.
Collect and graph performance metrics from the MON and OSD nodes in a Ceph storage cluster.
Use the Dovecot stats protocol to collect and graph metrics on configured domains.
Easily monitor and track key web server performance metrics from any running HAProxy instance.
Gather metrics about the running Kubernetes pods and containers for a single host.
Collect and act on a set of Mesos statistics and metrics that enable you to monitor resource usage and detect abnormal situations early.
Gather and graph metrics from this simple and lightweight messaging protocol ideal for IoT devices.
Gather phusion passenger stats to securely operate web apps, microservices & APIs with outstanding reliability, performance and control.
The Prometheus plugin gathers metrics from any webpage exposing metrics with Prometheus format.
Monitor the status of the puppet server – the success or failure of actual puppet runs on the end nodes themselves.