Kapacitor: Service Discovery, pull and Kubernetes
In this webinar, Jack Zampolin will be sharing how Kapacitor’s new Service discovery and scraping code will allow any service discovery target that works with Prometheus to work with Kapacitor. Combined with a TICK script, you will be able to use Kapacitor to monitor Prometheus scrape targets, write data into InfluxDB, and perform filtering, transformation and other tasks. With Kapacitor’s user defined functions (UDFs), it becomes trivial to pull in advanced anomaly detection and custom logic.
Watch the Webinar
Watch the webinar “Kapacitor: Service Discovery, pull and Kubernetes” by clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “Kapacitor: Service Discovery, pull and Kubernetes.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Chris Churilo: Director Product Marketing, InfluxData
• Jack Zampolin: Developer Evangelist, InfluxData
Jack Zampolin 00:01.980 Okay. Thank you very much, Chris. So today we’re going to be talking about how to do Kapacitor Service Discovery on Kubernetes. So I’ve dropped some links in the sidebar. We’ll need each one of these for a various different part of this demo and install here. So let’s just pop right into it.
Jack Zampolin 00:26.294 So what are we going to talk about today? We’re going to talk about the scraping configuration for Kapacitor, how to setup Kapacitor in your cluster. We’re going to deploy a tool called kube-state-metrics that gives you metrics on each of the different objects in your Kubernetes Clusters. So how many pods do I have? Which services are available? That kind of thing. We’re also going to go through normalizing Prometheus-style metrics. So just a quick note, there’s a difference between the Prometheus metrics format, which is essentially a long key with some tags in a single value in the InfluxData Format, which is a key, tags, and then multiple values. So the canonical example of that would be something like CPU, where you would have CPU, CPU busy, CPU per set available, and a variety of different metrics that are all under, what we would call, a measurement. Those are different metrics in Prometheus. So there is a process needed to change those, luckily, Kapacitor is a very powerful scripting language that allows us to do that. And then finally, I’m going to show you in Chronograf sort of what kind of metrics we’re getting and a couple of different ways to visualize that. Now one thing that’s going to be very important as I go through this, please, please, please ask questions. I will try and stop and do those. So if we peek over at the Kapacitor tab, you can see these slides that I’ve uploaded. I’m going to be paging through those slides and I’ll also be sharing my screen to do a demo. So again, please ask questions if you have any and let’s go ahead and get started.
Jack Zampolin 02:17.939 So these are the repositories that you’re going to need and tools that you’re going to need to follow along in this demo. I’ve dropped in the links over in the chat channel. There’s Helm, the kapacitor-cli which you can get by downloading Kapacitor at influxdata.com/downloads, tick-Charts, kube-state-metrics, and the prometheus-metrics-normalizer repo as well. So if you’re looking at all these things to download, you’re thinking, “What am I going to download or what did I just download? He just asked me to do a bunch of stuff, and I have no idea what’s going on.” So let’s walk through each one of those real quick. Helm is a package manager for Kubernetes. If you’re familiar with Kubernetes, you’re familiar with the fun yml files that you get to write to create pods, and services and deployments in your cluster. Helm is a way to template those and remove some of the drudgery of creating multiple template files for each one of your deployments. A standard Kubernetes deployment might be a deployment with a service. And it might also have a config map associated with it. The number of objects grows pretty quickly, along with complexity. So Helm helps you manage some of that complexity. You can think of it as apt get for your Kubernetes cluster.
Jack Zampolin 03:48.195 Next repo is tick-Charts. The packages for Helm are called Charts. I had a chart in that repository for each one of the InfluxData products. So this is just an easy way to install the InfluxData TICK stack in your Kubernetes cluster. Next is kube-state-metrics. This is a service that essentially pulls your ETCD cluster for Kubernetes. It pulls out the current state of the cluster, which pods exist with the closer views data and exposes those on a /metrics endpoint in the Prometheus style to your cluster. It’s a nice additional source of metrics, and it’s an easy way to know that this automated service discovery in scraping is working for you.
Jack Zampolin 04:49.743 Finally, there is the prometheus-metrics-normalizer repository. So we’re going to be doing this Prometheus-style scraping through Kapacitor, which is the stream processing engine for the InfluxData stack. In the prometheus-metrics-normalizer repo is a set of tick scripts, which are the scripts that you need to write to make Kapacitor do different units of work that normalizes these metric names and spreads them out into measurements with multiple fields. And it sounds a little bit more intimidating than it actually is. You just got to run one command. And it’ll define a bunch of tick scripts on your Kapacitor instance for you and sort of take care of this automatically, or at least that’s the hope. So let’s go ahead and dive deeper in.
Jack Zampolin 05:50.463 So the first thing that you’re going to want to do is install kube-state-metrics. So pop into the your kube-state-metrics folder and go ahead and do that. So I’m going to go ahead and share my screen here real quick. There we go, okay. So here’s kube-state-metrics, it’s essentially a series of Kubernetes deployment file. If you’ll notice there’s a cluster role binding and a cluster role file in there. If you don’t have the newer API’s enabled, or you don’t have R-back enabled on your cluster you don’t really need to worry about those. And you can deploy those with kubectl apply, file, and then kubernetes and that’ll go ahead and do that for you. So if you’ll notice there I got those errors for the ClusterRoleBinding and the ClusterRole, you can ignore those.
Jack Zampolin 07:14.014 Okay. Now that will generate in your cluster the deployment in the service to expose that. Now, let’s take a step back and talk about the scraper configuration within Kapacitor. So this code will live in the Kapacitor configuration file. That’s taken care of, for you, automatically by Helm, but I just want to talk a little bit about what are the different levers that you have to pull here and sort of how this works. So there are two different sections that you need to add to your config file and Kapacitor for each different Kubernetes resource that you want scraped. So in the configuration file in tick-Charts, I have all four enabled, so you’ll see these two sections here repeated four times with the word pod here that you can see in the slide, switched out for the various other resource types, so node, service, and endpoint. There are two pieces to this. There’s the scraper, and then there’s the configuration for the individual discovery type. So here’s the discovery configuration, Kubernetes, we have to give it an I.D., we have to say that it’s enabled. In-cluster tells it to use the application default credentials as its authentication mechanism. And then, this is the resource that we’re going to be discovering. Over there in the scraper. We need to reference that discoverer-id, so you can see Kubernetes pod there, Kubernetes pod up there, good to go. The discovery-service just kubernetes. So the way that we have this implemented in Kapacitor is that we’ve imported the Prometheus service discovery model, so if there is a service discovery type that’s supported in Prometheus, Console, AWS, they have over a dozen different discovery mechanisms implemented. You will be able to use each one of those. This particular demo is just showing the Kubernetes example. You need to give the scraper a name, and then you need to tell the scraper where the data is going to be, what the data’s going to look like, so it’s going to come into the Prometheus raw database, the metrics-path you configure. And then the last thing that you would want to configure is the scrape-interval and timeout. So I think for the Prometheus default it gives you a one-minute scrape interval, I’ve turned that up to 10 seconds just so that we’re getting a few more metrics here and it makes it easier to do. In this scrape, timeout is 10 seconds.
Jack Zampolin 10:29.292 So, next you want to go ahead and install Kapacitor in your cluster. So I’m going to go ahead and screen-share and walk through that. After you do that install, it’s going to drop some [inaudible] so make sure to grab those commands and run them in a couple separate terminal windows, and I’ll show you how to do that. So, this is the tick-charts repo. So we’re going to deploy the Kapacitor. The other chart, so it’s Helm, install, we can give it a name, I’m going to call it kapa, we’re going to give it a namespace to deploy it. I’m going to deploy this in the tick namespace. In order for the chart to successfully deploy, we need to point it at InfluxDB instance, so I’m going to go ahead and do that as well. So we need to set the influxURL is equal to—I have a little [inaudible] running in my cluster, if you have an InfluxDB Cloud instance you could easily set the InfluxDB Cloud URL there as well. Or any other Influx instance. It’s reachable from inside your cluster. So just make sure you have one. There is a charge for setting up an Influx instance that’s in the exact same directory here. So that is pretty easy to do, db-influxdb.prod. That’s where mine is right now. And it’s at port 8086. And we want to deploy the chart Kapacitor and run that. Okay. So Helm will spit out a bunch of output here. It tells you what it did. Response from resources, a config map and a service, and a deployment. It tells you where it’s addressable from in cluster. This port-forward command will forward the 9092 port on that pod to localhost 9092 on your computer. This makes it extremely easy to use the Kapacitor CLI to interface with that Kapacitor that’s in your cluster. So go ahead and run that in a separate terminal window. I’ve got one running right here for one that I have in a cluster. But let’s go ahead and run this. Okay. And then, if you run kapacitor list tasks, you should get some output there or kapacitor stats—okay. Yeah. We’ll do that next. And then, you also do need to run these logs here just to make sure it came up okay. That’s a quick and easy one. And you can see that it’s started registering those scrapers, the HTTP API has come up, and that Kapacitor instance is ready to go.
Jack Zampolin 13:56.410 Okay. So getting back to the slides. We need to check the logs in port forwarding. I showed you how to do that just now. So the next thing that you want to do is you want to see the metrics streaming in. So I’ll go ahead and show you what that looks like. But the command that you’re going to use is kapacitor stats ingress. Just going back and talking about Kapacitor for a quick second, you can think of Kapacitor as having a front, the metrics in the back that transforms metrics, or events come out of. So the ingress stats are how many metrics are coming in. And the metrics that the scraper is pulling in are going to be exposed to that. So let’s go ahead and take a look at that output.
Jack Zampolin 15:02.436 Okay. Okay. So you can see here the stats—you see those prometheus-raw metrics coming in. And you can see the longer metric names. So as I was saying earlier, the prometheus-metric and the Influx metrics are different. These apiserver metrics are a great example. So we have an apiserver request as a count of number of requests. And then, we have some metrics related to latencies as well. So if you take the stem, apiserver_request, you can imagine the number of fields under the same measurement count, latencies_bucket, latencies_count, latencies_sum, and so on as fields underneath that top-level measurement. In a series of TICK scripts that we’re about to define, it does exactly that.
Jack Zampolin 16:18.698 Getting back to the presentation, and I just want to pause here real quick. I don’t see anything in the Q&A. Does anybody have any questions or comments? Am I going too fast? Is this understandable for everyone? I just want to make sure everyone’s good.
Jack Zampolin 16:45.283 So Peter asks, “Do we have some subscriptions from InfluxDB too, here?” Yes. The way that Kapacitor’s configured with Helm Chart, it will automatically create this subscription on the InfluxDB instance that you specify with set influxURL. I have in cluster InfluxDB that have some other stuff streaming into it. So if I run—I can show you real quick. If I run kapacitor stats ingress, so this is that new one that I just spun up, you’ll see a number of other measurements coming into it. So I have Telegraf running and a few other things. Does that help answer your question? Awesome. Good deal.
Jack Zampolin 17:40.824 Okay, so back to the slideshow, and again, if you have any questions please drop them in the chat. I’m happy to answer those. The next thing we need to do is normalize metrics and output them into InfluxDB. So before you do this, Kapacitor needs to have—all right. Sorry. Just backing up just a tad. When Kapacitor is scraping this data, it’s currently in the front of Kapacitor, but it doesn’t go out the back by default. So it’s not going to write that data to Influx without you explicitly telling Kapacitor to write that data to Influx. So the prometheus-metrics-normalizer will take care of that, but conceptually, the data is coming into the prometheus_RawAuthDatabase. We’re going to process it, put it back in the front of Kapacitor going into the Prometheus database, and then it’s going to go out the back to Influx. So there needs to be a place in Influx to put this data, the Prometheus database. So whatever Influx DB instance you’re using, you need to connect to it and create the Prometheus database on that instance. Next, we want to run the metrics normalizer and I’m going to go ahead and do that. So let’s share the screen here. All right. Okay.
S1 19:26.841 Okay. So here we are in the Prometheus Metrics Normalizer. And just to walk through this little particular piece of infrastructure here. At its core, it’s just a simple shell script that defines a bunch of Kapacitor tasks. So the one that takes the data and writes it to Influx is here. It’s the simplest possible TICKscript stream from—and you pass in the database retention policy when you define it. So this is just writing an entire database retention policy to InfluxDB. And then the next TICK script that gets used down here is a templated TICK script. In this presentation, which I have some links to which we’ll send out a copy of later, I do have some links to the documentation about Kapacitor templated TICK scripts. You can go look that up on docs.influxdata.com right now if you’d like, but it is pretty straightforward. Essentially, you have a TICK script that looks like this, and you pass in a json blob that sets certain values within this TICK script. The define function here just runs through an entire folder of json blobs and applies them to this templated TICK script and defines a bunch of scripts on your Kapacitor instance. So let’s go ahead and do that. I don’t currently have any tasks executing on this Kapacitor instance, so let’s go ahead and run define. And then, ones you want to define are Kubernetes. Make sure there’s no trailing slash there, or else it won’t work. Slash is great that way. This is just going to take a second to loop through each one of those. We’re sending API calls back and forth pretty quick. You can see that those connections getting handled over in your port forwarding. And that’s done. So the next time that I run Kapacitor list tasks, you’re going to see a lot of tasks [laughter]. Now from the kapacitor stats ingress output that I showed you earlier, each one of these tasks is essentially named for a common stem for those metrics. We talked about apiserver requests, and that had a number of fields beneath it. Each one of these is a similar group of metrics. And then finally, you can see that they are all running on the prometheus-raw database, as well, and they’re all outputting data to the Prometheus database, and then that prometheus-out task that I showed you streamed from influxdb-out that’s writing the transformed data to Influx, so you don’t have any of this really kind of messy difficult to work with data in your database.
Jack Zampolin 22:43.802 So hopping back to the webinar. So when you type, kapacitor stats ingress | grep “prometheus “, if you put a space after it, it will exclude those prometheus-raw databases. You will see those and you can see those measurements in Influx and I’ll go ahead and show you that right now.
Jack Zampolin 23:20.265 So you see each one. And if you’ll notice each of these measurement names corresponds to a Kapacitor task. So pretty straightforward one-to-one mapping. I also have my Influx CLI connected to the instance that this is writing to, and if I run show measurements, you should see that data coming in here, and you do. So these are all of the different measurements that you’re going to get. And popping back to the webinar, demo time is done [laughter].
Jack Zampolin 24:22.953 So I’m going to go ahead and show you how to graph this in Chronograf. So I’ll go ahead and share my screen. And again, if you have any questions—okay. Peter asks, “Do labels end up in measurements?” Peter, what do you mean exactly by that? Can you give me a little bit more information? If by labels, do you mean tags? Because that’s the similar concept of Prometheus, labels from pod metadata. Yes, depending on the measurements and the way Prometheus collects those measurements from the source. If the pod metadata is included as a tag in Prometheus, it will be included as a tag in Influx, and I’ll show you that here. So I have a Chronograf instance that is attached to this database, so we’re going to go ahead and jump into the data explorer. So for example, in the API server request measurements, we’ve got those field names that I mentioned earlier. And then, you can see all of the different tags that come in that way. So clients, resource types—so in the event of the API server, it’s answering questions about things like config maps, deployments. And you can see the requests group by each one of those. Now, in the example of maybe a kube_daemonset, it gives you the metadata from the DaemonSet. So I have different—pardon me, DaemonSets running on this cluster. And you can see each one of those there. So you can group by the DaemonSet. And in the kube_pod, you get the container ID, pod ID, and then a bunch of other labels in there as well. So if you’re going to get this data in Prometheus, you’re going to get it in Influx. Awesome. I’m glad that helps out, Peter.
Jack Zampolin 26:48.956 So just a quick run-through, some of these common metrics that come out here, etcd-metrics. So a number of requests in the request latency from your ETCD cluster. Heapster metrics—so Heapster runs in Exporter. There’s also scrape metrics. So in this case, in Prometheus, these measurements monitor the Prometheus scraping. Here, they’re monitoring the Kapacitor scraping, so how long it took to scrape each instance. So you can see here each of the different targets are coming in below the 10-second timeout. The only one that’s not, it’s the Kube DNS. So that’s an issue I’m working on troubleshooting. But everything else is coming in under a second. And it looks like there’s one that’s right around two seconds. So this is just as performant as it’s going to be in Prometheus as well. One of the things that is not included in the default Kubernetes metrics is nginx metrics. If you’ve installed the nginx ingress, the Helm chart for it, which if you haven’t, I would highly recommend. It’s a great way to manage ingress and egress from your cluster, especially by providing private authorized fees for services inside your cluster and then selectively exposing them to the outside in a very declarative manner. That chart has a slash metrics endpoint for nginx. So those nginx server metrics like the number of bytes that you are sending out at a time, we cachize the number of connections coming into the server, all kinds of fun stuff like that. Yeah. So I’m going to go ahead—mute me? I’m going to go ahead here and pause for questions.
Jack Zampolin 29:13.077 Art’s just asked, isn’t possible to scrape Prometheus targets with Telegraf for simple scenarios, when, why do we need to use Kapacitor? Art, you are absolutely correct. It is possible to scrape Prometheus targets with Telegraf and those measurements look like this. So you’ll see the full thing. So why do we need to use Kapacitor? So with Telegraf, you do need to do—be very declarative. You need to say exactly where things are. And in dynamic environments, sometimes that can be difficult. Also, when new services come online, you have to tell Telegraf directly as part of your update process that users exist, restart the process and add the configuration in order for Telegraf to properly scrape that. The discovery part of this obliviates any of that step. Kapacitor will be continually checking for new targets that the cluster manager, in this case, Kubernetes offers. And it will scrape them as they come online. So you can imagine in a SaaS—if you’re running a SaaS service and your spinning up services for each one of your customers, Telegraf, you would have to explicitly update each time you brought a new customer up and down. In the Kapacitor case, that work is completely eliminated. Kapacitor will automatically discover them when they’re there and notice—well, when they’re gone, it won’t be able to scrape them anymore. So those metrics will come online and come offline properly. Does that help answer your question, Art? Awesome.
Jack Zampolin 31:10.020 Oh, yeah. Hit me. Also while Art is asking one more question, any other questions are absolutely welcome. Art also asks, “Does it only work with Kubernetes for the service discovery parts?” No. It works for a number of different providers. So for example, any of the Prometheus service discovery options that—here, let me post a link here. So any of the options in that Prometheus operating configuration documentation, such as Measure, Console, DNS paid service discovery, EC2, file-based service discoveries—so Kapacitor can read a number of targets from a file and scrape each one. And then, you can update that file periodically, GCE, so in the Google Compute Engine environment, Kubernetes, Marathon, Nerve, Silver Set, Triton, that’s some more data center. You can even statically configure targets. But there’s a number of different service discovery configurations and we support each one of those in Kapacitor. This demo only shows the Kubernetes one obviously. But there are more.