Advanced: Kapacitor Event Handlers & Node Alerts
In this webinar, Michael DeSa will provide you with a detailed overview of Alert Handlers and Node Alert with Kapacitor. An AlertNode can trigger an event of varying severity levels, and pass the event to alert handlers. Different event handlers can be configured for each AlertNode. Some handlers like Email, HipChat, Sensu, Slack, OpsGenie, VictorOps, PagerDuty, Telegram and Talk have a configuration option ‘global’ that indicates that all alerts implicitly use the handler.
Watch the Webinar
Watch the webinar “Kapacitor Event Handlers” by clicking on the download button on the right. This will open the recording.
Transcript +
Here is an unedited transcript of the webinar “Kapacitor Event Handlers.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
Speakers:
• Chris Churilo: Director Product Marketing, InfluxData
• Michael DeSa: Software Engineer, InfluxData
Michael DeSa 00:01.854 Perfect. Thank you, Chris. So as she mentioned, today’s topic is on alert handlers. Our topic’s on alert handlers which were added in the last release of Kapacitor, and just kind of a more general way to handle alerts or create certain rules based off of the alerts. And so we’ll talk about that in kind of a bit more detail as we go one here. So the agenda for today is we’re going to talk about what a topic is and as it relates to Kapacitor. We’re going to show how you effectively use topics in TICKscript. We’re going to explain what a topic handler is and how they are used. We’re going to define our own topic handlers, and then we’re going to talk about a couple special-purpose topic handlers. And then you’ll note that I keep saying topic handler, and you’ll see a specific reason for that as we move on here. So topics in Kapacitor are specified in a TICKscript. Typically, what you’ll do is you’ll say, “Every alert I want to generate to this kind of topic.” It allows users to separate out the handling of alerts from the task that generates them, and it gives the user ability to handle alerts in kind of a more sophisticated way. And we’ll see what that means as we go on. And it falls as kind of publish-subscribe pattern where I kick off an alert, and I have a number of things that are published as alert, and I have a number of things that are kind of out there listening for it, and then it does the assorted things. In this case, the alert, sort of the alert events, are published to a topic, and the handlers subscribe to a topic. So a pool of data for that.
Michael DeSa 02:03.266 And the reason why I was calling them topic handlers earlier is that handlers are scoped to a specific topic, meaning that every handler that I defined must be defined in the scope of a particular topic. So if I have the topic say, On-call, I would have a number of different handlers for my On-call topic, and they are kind of separate from any of the handlers that I may have on my production topic or whatever it be. So just to give you an idea of what a TICKscript would look like without a topic in it, and this is where the handlers are defined explicitly in the TICKscript, we’ll start with this small example here where I’m streaming data from the measurement CPU grouping things into their sorted series, and then I trigger an alert. And I won’t trigger a warning if usage idle is less than 20, or trigger a critical if usage idle is less than 10. And then I have two kinds of handlers on this alert node, which are the Slack handler and another Slack handler, but each one is triggering alerts to a different location. So in this case, one of them triggers to alerts, the slide channel alerts, and the other one triggers alerts to the on-call channel. So one of the big issues that I’ve mentioned is that all of the handlers here are kind of defined in the TICKscript. So if I want to do something like add a new handler or adjust the things that happen as the result of an alert, it’s a bit of work to do, and I have to redefine the task and restart it. And in the meantime, I can actually lose a bit of state that’s kind of going through my task currently.
Michael DeSa 04:08.139 So suppose that I currently have an alert that’s about to be triggered, and then I stop my task, then the data hasn’t gotten through the pipeline completely, that task will actually never be triggered, or that an alert will never be triggered. And so I can’t make changes to the actions of my alerts without completely restarting the task. And that was kind of the old way of doing things which is a little bit problematic. The other issue here is you couldn’t really have sort of cascading alerts. So if I wanted to do something like I trigger my initial sort of say, on-call alert, and a number of things happen where it sends an email, it sends a number of other things, I couldn’t really segregate those into kind of separate dimensions or name them separately. So I couldn’t have one type of alert trigger a whole different class of alerts just so I could sort of categorize things neatly. So everything that I wanted to have happened had to be defined explicitly on the alert node. And this was a bit painful. On top of that, there was a bit of sort of demand for being able to do kind of condition matching on alerts. So I want to generate an alert for everything that comes through, but I only want to generate a—page somebody if the state has changed or if it’s a certain type of categorization, or if the amount of time that the task took was particularly long.
Michael DeSa 05:54.276 So this kind of matching conditions that are a little bit harder to describe that you may actually not be—that aren’t possible to describe explicitly in TICKscript. And so that’s kind of the scope of the problem that we’re trying to solve where everything has to be currently—or in the past, everything had to be defined in a TICKscript and you couldn’t really have these cascading alerts. You would lose all the state if you ever wanted to update the actions that happen from an alert. And you couldn’t do this kind of complex or more involved condition matching for the types of alerts that came out. And so that’s why we introduced topics. So a topic is just as we mentioned, this kind of pub/sub channel where I can—a topic gets published to and there are a number of handlers that kind of subscribe to the results of that topic. So we’re going to go through the process of taking the old TICKscript that we had and kind of making the equivalent thing but in the new topic handler model. So in this case here, we’re streaming data from the measurement CPU, grouping things into their associated series, and then we issue an alert by the same conditions, usage idle less than 20, or as a warning usage idle less than 10 is critical. So here we have the topic basic. We’ve just introduced the topic here. This is actually the way that you create a topic. Just using it in a TICKscript kind of invokes it in the first place.
Michael DeSa 07:39.181 And so when we’re sort of converting to this new topic handler model, we want to remove all of the handlers that were previously in the TICKscript and just replace them with topics. So you can keep a couple of them in there if you want something special for a particular TICKscript, but you don’t really need to do that. So once we have done that, we add in the topic, and we should be kind of good to go. So now that we’ve created the topic, we can actually issue a list command to see all of the various information that has been listed or all the various topics that exist. So I can issue a Kapacitor list topics, and I get to back this data that shows the ID, which is the name of the topic, the current level of the topic, which is okay. It just says that that’s the current status of it, and then how many alerts have been collected by this topic. So in this case, we only have one, but in a more complicated example, you may see a number of different topics that you had generated. On top of being able to list all the topics, I can show a specific topic. So if I say, “Kapacitor, show topic, the name of the topic,” which in this case is basic, I will see a bunch of information. I see the ID for the topic. The current level. The number of alerts published or that had been collected, and then I see handlers which is there, and we’ll talk about again in just a moment. And then I see a sample of the most recent collected events. So it will show me the event name, the level of the event, and a message associated with it. So I can see everything that kind of comes through here. And actually, in Chronograf, we’ve been exposing this information kind of just for people to see. So I would recommend if you haven’t yet, if you utilize topics in Chronograf, you can get this more detailed output of the types of events that are being generated.
Michael DeSa 09:58.437 So this is great. We’ve shown that we’ve made a topic, but at the moment, there were no handlers that had been defined on that topic. So now we’re going to go through the process of defining a topic handler, and to do so requires a number of things. You need a topic. You need to name the handler, and then you need a yaml file that sort of specifies the configuration for that particular handler. So in this example here—and that yaml file needs to specify a number of things. It specifies the type of handler that it is. It’s called Kind, and then that this is typically the property method from TICKscripts. So if you remember in the TICKscript we had earlier, we had a Slack handler, and so the Kind here will be Slack. And then you have the required parameters which are specified as options. So in this case, we had the channel as alert. So our channel is on-call, and so you just specify them as key value pairs in yaml. So in this case, you see we have two yaml files. We have alerts at yaml which is Slack, and it has options channel alerts. And then likewise, we have the on-call which is also Slack and has options channel on-call. To define the alerts or to define the handlers, we say: “Kapacitor, define-topic-handler. The name of the topic. The name of the handler that you’d like to name it, and then a yaml file.” So in this case, we say, “Kapacitor, define topic handler basic alerts-channel and alerts.yaml” And then we do the same thing but for our on-call yaml. So, “Kapacitor, define topic handler, basic on-call-chan on_call.yaml.”
Michael DeSa 12:01.709 Now that we’ve added a couple handlers, we can actually list all the handlers by issuing a Kapacitor lists topic handlers, and this just will give you back all of the topic handlers that exist. So in this case, we can see the topic, which is basic in our example, and the two different topic handler IDs which is alerts channel and on-call chan and then their associated kinds. So in this case, they’re both Slack handlers. And we can also show a specific topic handler. So in this case, we’re going to issue a Kapacitor show dot handler
, the topic, and then the handler. So in this case, we’re saying, “Kapacitor, show topic handler basic alerts channel.” And then it shows us the ID which is alert channel, the topic basic, the kind that it is, which in this case is Slack, the match, which is currently MT, but we’ll talk about in a bit more detail in just a moment, and then any of the associated options, which in this case, we have channel alerts and then the other case we have channel on-call. So just going giving you an idea. This is kind of just the process that you go through when you’re interacting with a topic handler. So you make some changes. You can sort of list them. You can see what’s kind of going on here. Just so you have a way to get some idea of the various topic handlers and topics that are going on.
Michael DeSa 13:35.006 So now that we have actually added a couple of handlers, I want to show you again the Kapacitor show topic basic which shows the topic itself, and then you’ll see all the associated handlers. In this case, we had the alerts channel handler and the on-call channel handler. So just to show you that as you add things, they will get kind of populated here. So you can see all the associated handlers. Next, we have an example where, if you remember I talked earlier about this kind of chaining various handlers together, and this is kind of a good example of that. So topics can be chained together to publish an action. So this allows you to further group your alerts into various topics. So the example of this is I’m going to have a thing called chain.yaml that is a “kind: publish”, which means it’s going to just do a broadcast, and it’s going to—my options specify the number of topics that it’s going to broadcast to. So you can think of it kind of in this way where I maybe have an ops team channel where whenever I get a particular type of CPU alert, I want to trigger things to the ops team, or if I get something for the, I’ll say, the application team, I can trigger it to be application team, and I can these kind of cascading events where I can kind of pick and choose what goes where. And we’ll see in just a moment when I get into adding a match condition where you can start doing things like send everything to the application team, but only send critical alerts to the ops team or things like this, or send everything to a Slack channel, but only send critical alerts to pager duty, things like this.
Michael DeSa 15:38.351 So again, to define this topic handler, I say, “Kapacitor, define topic handler, the name of the topic, which in this case is basic, the name of the handler, which in the case is chain, and then the yaml files that specifies the contents of the action. So here, we’re getting to an example of using the match conditions that I had talked about. So what we want to do is conditions. So there’s conditions for matching a handler, maybe set in the match section. So it must be a Boolean expression. So whatever goes into this match here must be a Boolean expression. And to get an idea of what you’re allowed to put in there, it’s anything that is a kind of built-in Kapacitor function. We actually take this value and wrap it in a Kapacitor lambda, and then evaluate it to get the resulting value. So any Kapacitor built-in functions work, and there’s a number of other types of functions that are special purpose that we’ve added, namely, the changed function. So changed tells you whether or not the level of the alert has changed. So if it’s gone from a critical to a warning or it’s gone from an okay to a critical, change will tell you whether or not that actually took place. In this case, you actually don’t need the changed() equals true. You could just say changed. There’s also the function level which will tell you the current alert level. So is it critical? Is it okay? Is it a warning? Is it info? You can get any of these, or you can get the name of the alert that has been generated. You can get the task name so that you may only do things, alert certain people for certain tasks, and then you can get the duration. So how long the alert took to, sort of, took place. So as I mentioned previously, the idea for this is you can start to sort of craft complex kind of routing rules or conditions for who gets alerted on what condition. So the example that comes to my mind is this changed here where I want to be generating alerts that say go to Slack or go to some sort of common alert place anytime anything happens, but I really only want to notify people in the case that the value is changing or has changed.
Michael DeSa 18:23.846 So this would allow you to do that. So you only notify the ops team if the value has been changed, and you publish it to the ops team, and then any kind of alerts that the ops team has set in place will also be triggered. And it kind of just gives you a way to do a more complex routing of these rules. All right. So we can also show the topic handler. So we can see that match condition filled in. In this case, we have the change equals true. We could’ve just had changed here. Everything else is basically the same. And so to give you a kind of summary about that such little short one here. Summary though is Kapacitor users can utilize topics instead of explicitly handling alerts in TICKscript. It requires the use of public handlers, obviously. Handlers are scoped to a particular topic, meaning that one topic has many handlers, and a handler belongs to a single topic. So it can’t have this kind of many-to-many relationship. Topics can be chained together using the publish handler. Handlers have additional matching logic that allows for more sophisticated event handling. So if I want to do something like these cascading things on certain conditions, or only notifying certain teams on certain conditions, you could do so. So if you have any other sort of questions, we do have a short tutorial in the guides section of our documentation. So you could say add your—or take a look there and sort of get an idea of a little bit more in-depth example of how this would actually be utilized. And obviously, we’ll have these slides that are also available. On that note, I’m kind of out of content here. So shorter one today. But I’m happy to hang around to answer any questions that do come up.
Michael DeSa 20:36.953 We do have one question right now, which is how do we go about defining our own kind? So the way you would do that is—all the types of handlers that were previously available in TICKscripts, so Slack, VictorOps, PagerDuty, there’s this kind of SMTP. There’s a huge array of them that you could go check out. All of those will work as various different kinds. If for some reason you see that there’s not a kind that exists for you, we have some documentation on how to implement your own handler. I will involve writing a little bit of Go code, but there’s a very sort of standard process. There’s not a whole lot of thinking you need to do. It’s just kind of hooking up pipes and a very little amount of coding. So to define your own Kind, you’d have to implement your own handler. And then once you’ve done that, everything should kind of just work from there. Any other questions?
Chris Churilo 21:53.185 So I see we have a question on the slide downloads. Yeah, we can post this on a SlideShare. Not a problem. And as Michael had mentioned, we’ll stick around for other questions about this topic or anything about Kapacitor actually. And so don’t be shy. Go ahead and put your questions in the Q&A or the chat panel.
Michael DeSa 22:18.152 Yeah. I’ll be answering anything.
[silence]
Chris Churilo 22:34.069 So, Michael, you go back to that resources page?
Michael DeSa 22:37.185 Yeah. Oops. There you go. So this is for how to use topics, and then I will get the information for writing your own handler. So I’ve responded in the Q&A with a link to an example pull request of adding a new Kind. So it goes through kind of all the things you’d need to do for that. So we have a question in chat that says, “We’re new to Influx, and we’re looking to use Grafana. Would your recommendation be to set up all of our alerting within Kapacitor?” So there’s two ways to do it. One thing that’s a little bit difficult, particularly with Grafana, is Grafana and Kapacitor don’t integrate with one another yet. And so any of the alerting that you did would be a part—alerting that’s a part of Grafana. So Grafana has their own recommendation of alerting. It does sort of basic thresholds, but if you need something that’s more sophisticated with these cascading kinds of alerts or these kinds of dynamic rules and—to my knowledge, Grafana doesn’t have anything that’s quite as powerful just yet. So Grafana, I would definitely recommend to do your alerting with Kapacitor. And consider using Chronograf or at least giving Chronograf a try to see if it sort of meets your needs. At this point, Grafana and Chronograf have near feature parity. Grafana, obviously, has more features, but all the base features that are really kind of necessary are usually a part of Chronograf already. And Chronograf has built-in support for Kapacitor, and you create rules and dealing all the alerts. So it’s entirely something you can do.
Michael DeSa 24:42.106 So the next question is, “Can Kapacitor be used with Prometheus for altering?” And the answer is yes. We have implemented a—we can scrape any Prometheus targets, and you can either write that data into InfluxDB or do any kind of altering that you would like to do with that. I can find some more information on that. Prometheus, there. So let me pull up. We have a blog post and a repository that can be used. I believe this is the one. So here’s the link to being able to integrate with Prometheus targets via Kapacitor. There should be a number of other kind of files that you can use. Let me see if I can get the other. So there’s another question. “Is there any good resources to Kapacitor?” I would definitely recommend taking a look at the documentation. We’re going to kind of always in the state of improving the state of our documentation, but the documentation is definitely a good place to start. The next question I have here is, “Is there a best practice for managing and deploying TICKscripts that’s for topics?” There’s some answerable roles floating around, but nothing is up to date. Yeah. So we’re currently—this is actually a project that is coming out or slated for the next release, the 1.4 release, is to have a standardized way for managing deploying TICKscripts. Particularly, it’s a little bit difficult when you, say, you make a change. You don’t want to be interacting with the API as much. So we’re working on a number of things. If you have any suggestions, please do send us an email. We’re still kind of in the design phase for what we think that will end up looking like.
Michael DeSa 26:58.058 The current kind of thinking is maybe a directory where we place all this information, and then anything that is in there will get loaded into Kapacitor. So you don’t have to manage things individually. The next question is, “Do you have a plan to implement the InfluxDB out as an event handler? I see that it isn’t currently supported.” So we didn’t have plans for this, but this has been increasingly requested. In fact, I just had somebody request it within our own company. So we’re kind of redoing our own internal monitoring. And so this specific feature was asked for. If you are particularly interested in it, please do open an issue on the InfluxDB on the Kapacitor repo. That way we can kind of more properly at least track it. And I’m happy to add it if you would feel comfortable doing so.
Chris Churilo 28:09.062 You’ve got just a note from Hans. He says he likes the topics feature, and he’s going to check out the documentation further.
Michael DeSa 28:18.106 Awesome. Glad to hear it. And if there’s anything that comes up, please don’t hesitate, or any questions, or additional documentation, please either ask on our community channel, or if you find an issue, open up an issue on our repo. We’ll be happy to help you out.
Chris Churilo 28:45.049 Okay. We’ll keep the lines open for a little bit longer. I think Michael did a really great job of covering this topic, and there’s a ton of resources available to everybody. And always feel free to go to our community site. Michael and Nathaniel are often there answering people’s questions about anything related to Kapacitor.
[silence]
Chris Churilo 29:34.073 And I will put the slides in our SlideShare area. I’ll write myself a note to remember to also put that link in the community site as well as in the email that goes out later on.
[silence]
Chris Churilo 30:01.873 Okay. Looks like the questions have slowed down, but feel free to reach out to us if you do have any other further questions, and especially reach out to me if you have particular topics that you’d like us to cover in our trainings. These are meant to help you guys. So I I’d love to hear any kind of feedback about what you want to hear in the next set of trainings. So with that, no we don’t have the Slack channel. We did, but we really want to make sure that we get everyone going to the community site. Sorry, that was a question that came from Aman. Just because it was a lot easier for us to manage all questions in one area. We were starting to not do such a great job of answering questions when we were going to many, many different channels. And so we decided to consolidate everything. So with that, just want to say thank you to everybody, and I will get to posting this recording at the end of the day, and then I’ll send out that email with the link to community and with the link to the slides for everyone to take a look at. All right. And have a wonderful rest of your day, and thank you so much.
Michael DeSa 31:19.876 Thank you.