Telegraf is a plugin-driven server agent for collecting & reporting metrics and there are many plugins already written to source data from a variety of services and systems. However, there may be instances where you need to write your own plugin to source data from your particular systems. In this session, Jack Zampolin will provide you with the steps on how to write your own Telelgraf plugin. This will require an understanding of the Go programming language.
Watch the webinar “Write Your Own Telegraf Plugin” by clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “Write Your Own Telegraf Plugin.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Jack Zampolin: Developer Evangelist, InfluxData
Jack Zampolin 00:00.621 Thank you very much, Chris. Today, we’re going to be talking about how to write a Telegraf plugin. So one of the first things I’d like to point off is that there’s an excellent guide over on Telegraf. If you poke over to—let me show my screen here real quick. Okay. If you poke over to the influxdata/telegraf repo, there’s an excellent CONTRIBUTING.md guide. There’s a step by step example for writing each of the plugin, which is this is the source for this webinar. I’ve also written and contributed about four or five different Telegraf plugins, and this has been my best friend throughout that whole process. So I would encourage you to go over and check that out. Maybe have it open while we’re going over this. But step by step instructions on how to do each of the plugins. So that’s going to be a big help. But we’ll walk through each of those steps, explain them, and maybe help you writing your own. So let’s go ahead and get started.
Jack Zampolin 01:20.627 Let’s just step back for a second. What are we going to be discussing today? What is Telegraf? So I’m going to give you a brief overview of Telegraf, talk about the architecture. What are some examples of Telegraf plugins? So what are some things that you can do? What are the different types of Telegraf plugins? How do people generally use Telegraf? How do I use Telegraf? What does it look like when it’s deployed? What is the plugin architecture in Telegraf? And then finally we’ll go step by step through how to write a Telegraf plugin.
Jack Zampolin 01:56.193 So what is Telegraf? It’s an agent written in Go, just like the rest of the TICK Stack that is InfluxDB, Kapacitor, Chronograf, and Telegraf. And it collects metrics from the system that it’s running on. So for example, the memory and CPU plugins, those look at files on disk, read values from them, and then push them to InfluxDB. It also can pull common pieces of infrastructure. So for example, Redis or Postgres, we have a plugin for that. Telegraf will make an HTTP request to an endpoint on that process, pull some of the data down, and then transfer that to InfluxDB. So the same process can also be used for third party APIs. And Telegraf can write to many different output sinks.
Jack Zampolin 02:57.865 In addition to InfluxDB, we have integrations with many common queueing systems, for example, Kafka and MQTT. Telegraf access both the producer. So it can emit metrics to a Kafka topic as well as a consumer. So it can read them off a Kafka topic as well. This makes hooking into queueing systems in your infrastructure for a metrics pipeline very easy. So if you need to buffer metrics upstream of the database maybe for batching purposes or additional visibility in your metrics pipeline, that’s a great way to do it.
Jack Zampolin 03:41.482 Telegraf is designed to have a very minimal memory footprint. So it only collects data on the collection interval and tries to hold as little in memory as possible. It is very parsimonious with your CPU resources as well. There’s also a plugin system that’s designed to be easy to write and easy to contribute to the project. So if you have a data source that we don’t implement, we try to make it easy for you to get that into Telegraf. And just a quick note while I’m thinking about it, if you have any questions during the presentation, please drop them in the Q and A box. I’ll try to get to them during the presentation if I see them. And if not, I’m happy to answer questions at the end. And at the end, not just Telegraf questions, but any sort of InfluxData related questions. So all right.
Jack Zampolin 04:37.288 So we talked about a couple of the different plugin types earlier. And I just want to go over a few common ones in a little bit more detail. So one of the types of plugins we didn’t cover earlier was a listener plugin. So the StatsD plugin would be a great example. That plugin opens a server on a port and listens for data formatted in a particular way and then runs that through Telegraf. So the StatsD plugin listens for data in StatsD format and outputs that. There’s a lot of different StatsD server implementations out there, but this one will convert those StatsD format directly in InfluxDB. It’s used in production in a large number of enterprises and is really quite capable from a performance aspect.
Jack Zampolin 05:33.328 The Postgres plugin is a great example of a database monitoring plugin. It pulls built in data from Postgres using the PG stat database as well as some information on ingest from the [inaudible] views. The Apache and Nginx plugins are great for collecting those web server metrics. So if you want to know how many requests per second you’re getting or any of those other really web server specific stats to get a really gross monitor of traffic into your environment, these are great. I’ve got the Apache ones set up on a couple of projects that I run internally, and it really does give great stats.
Jack Zampolin 06:26.742 If you’re running Windows, Telegraf has you covered as well. There’s an excellent Win Perf Counters plugin for collecting system metrics or anything from those Windows Performance Counters. And we have a number of folks using this in production at large scale on Windows machines. So if that CPU plugin, the standard one, doesn’t get you what you want on a Windows machine, try out the Win Perf Counters plugin. It will probably get what you want.
Jack Zampolin 06:56.821 There’s also a Webhooks plugin as well, not just GitHub Webhooks, but we have a couple more of those at this point. That, again, like StatsD, will open a server and listen for incoming Webhooks. So the GitHub example there will listen for events coming off of GitHub and persist them into Influx. This is pretty cool. I have it set up for InfluxData, our GitHub organization. So any time people write a POST request comment or push up new code, I store those events in Influx. And that makes them really easy to graph. So if you want to get a good idea of your start graph, there’s some tools out there online. And obviously, GitHub’s API being open source, it has been implemented by a number of folks. But if you’d like to do it yourself and store the data yourself, this is a very easy way to do that. You just need some credentials and you’re ready to roll.
Jack Zampolin 08:02.557 So how do I use Telegraf? We’ve sort of talked about how the plugin system works, but how do you use it? The easiest way to get started is to download the binaries at influxdata.com/downloads. And once you’ve unpacked the tarball, you can generate a config with Telegraf very quickly. This is one of the features that I, personally, like most on Telegraf and I use all the time. The binary itself will generate the configuration files based on the input and output plugins you want. So if you can see the code there, if you say telegraf -sample-config, and then you -input-filter, choose any of the almost 100 inputs that Telegraf has. And you can separate those with a colon for multiple inputs. And then you add the output filter, and you pipe that into a config file. If you are just outputting to InfluxDB on localhost and using the defaults for your different inputs, you can just run that config file immediately after that. This makes it really easy when you’re jumping into a new environment trying to figure out how to configure Telegraf. Telegraf generates the configuration. And generally, those options that are given in the generated config, just filling them in with the defaults and maybe a couple of things specific to your environment will get you a working configuration and get you visibility into whatever you’re trying to monitor very quickly.
Jack Zampolin 09:43.258 So Telegraf will normally run as a Linux service. So if you want to generate a config and have Systemd take care of that, you would write that config to /etc/telegraf/telegraf.conf and then say, service telegraf start, or systemctl start telegraf, either one.
Jack Zampolin 10:10.280 So now that we’ve got an overview of Telegraf, it’s time to dive into the meat of the talk, which is writing a plugin. So what do you need to write a Telegraf plugin? You need Git installed on your computer to pull down the repo. And then you need Go, the language, installed. Go 1.5+ is desired. I think we currently compile Telegraf with Go 1.8, but we’re not really using any of those new features yet, so if you have an older version, that’s okay. And then the other thing you need is a desire to write a Telegraf plugin because if you don’t, you’re probably not going to finish it.
Jack Zampolin 10:52.611 So what plugin are we going to write today? There’s a plugin that exists in Telegraf called Trig. It just emits a simple sine wave to InfluxDB. Regan Kuchan, our documentation expert here, contributed it. And it’s an excellent example of how to write a Telegraf plugin because it’s very simple. So we’re going to go through writing that plugin and what the workflow looks like and how to do that on your local machine. So let’s do it.
Jack Zampolin 11:36.263 So to get started, you need to pull down Telegraf. So if you’ve got Go installed, you can just go get it. You could also git clone that repo as well, and then change directory into Telegraf. If you’re going to contribute to the repo, it’s generally good if you checkout a new branch. And if you’re going to contribute, you might want to fork it as well. But just for local development, a simple git checkout to my sweet plugin branch will do. And then if you type make, that will pull down all the dependencies for Telegraf and make a binary locally.
Jack Zampolin 12:25.451 Telegraf does make it extremely easy to build the project, which is very nice, especially when you’re building plugins like this, because you are going to have to recompile relatively frequently and test the new binary. So when you call make, it will generate a new Telegraf binary at your Go bin, so GOPATH/bin telegraf. If you’re not familiar with Go development, the GOPATH is the path under which all of your Go code is stored. There’s a few different folders in there, one for the source code that’s organized by the different Git providers. So there’s a GitLab, a GitHub, Google’s got their own Git repos up there as well, and then organization code. And then when you compile Go programs, the binaries live at GOPATH bin, and then the repository name. So when you generate the binary here, it will generate that Telegraf binary there.
Jack Zampolin 13:41.696 So when you’re creating a plugin, you’re going to need three different—or no, sorry [laughter]. Really, two different files. You’re going to need a directory to store your plugin in, and that’s plugin inputs. You’re going to make your Trig directory there. The title of the directory needs to be the name of the plugin. So our plugin is named Trig. We’re going to make the Trig directory. And then inside that directory, there needs to be a trig.go. So if we were making a Mesos plugin, it would be make directory Mesos and then there needs to be a mesos.go file in there. There’s also going to be a—if you want to contribute that plugin upstream to Telegraf, you do need to write tests for your plugin. So later we’ll create that test file and walk through that as well.
Jack Zampolin 14:50.680 So writing a Telegraf plugin requires you to implement a Go interface. Interfaces in Go just require that the struct has certain methods with very specific function signatures. So the input interface for Telegraf is extremely small, which makes it very easy getting started with Go project. The first two methods are just return strings [laughter], and they simply describe what your Telegraf plugin is doing. So sample config needs to return any configuration variables that you’re going to have that are either required or optional for your plugin. And this is how Telegraf generates those configuration files that we talked about earlier. Each of the plugins stores its own sample config, and Telegraf will just dynamically generate that as you call it. So this is required for that feature. So any configuration parameters that you need, you’ll add here. The description, just a human-readable, one-sentence description of the input. This makes it so your users can understand a little bit more context about your plugin. That’s also used in the generation of the config files. And then finally there’s a gather function, and this is where you do all your work. If you’re familiar with Telegraf, it runs on an interval. So every X number of seconds, that number defaults to 10 seconds. Telegraf will run the gather function of each one of the configured plugins. And that accumulator there, we’ll talk about it in a little while.
Jack Zampolin 16:47.107 So in that trig trig.go file, you’re going to want to copy in some Boilerplate. You’ll find this Boilerplate in the Telegraf contributing.md section. And just copy that in and start your work from there. So there’s some Go imports on the top. You do need to import Telegraf itself as well as the other inputs. There’s some code there that you’re going to be using. You’re going to need a struct that’s going to be the base object for your plugin. If you’re familiar with other object-oriented languages in addition to Go—Go’s not necessarily object-oriented—but if you’re familiar with other object-oriented languages, this is going to be very familiar to you. And then we need to implement each of the methods from the interface so that Trig can satisfy that. So there’s a sample config method that returns a string. There’s a description method that returns a string, and then that gather function where we’re going to do our work. And then finally we need to tell Telegraf that our Trig plugin exists, and you do that in the init function. This will get run when Telegraf starts up if the plugin is enabled. So we just need to return an instance of our struct from that.
Jack Zampolin 18:22.054 Any questions, just while I’m—oh. [inaudible] asks, “Are Oracle sources available?” I don’t know if there’s currently an Oracle plugin right now. It’s been a requested feature for a while. So maybe if you want to contribute it or have an idea how to do it, please open an issue on Telegraf and we can take the discussion there. Okay. Awesome. And again, while I’m going through this, if you have any questions, please just drop them to the Q & A.
Jack Zampolin 19:06.387 So the next thing you need to do—you need to add your plugin to the list of all the plugins that are in Telegraf. That’s in a conveniently named plugins inputs all file [laughter], so you’ll see a long list of Go lang imports there, and you just add a new line for your plugin. So ours is called Trig. We’re going to add that Trig line. This ensures that Telegraf can import your plugin and knows where the code is and how to run it. So this is a very important step. If you’re contributing a plugin upstream, you’re probably going to have merge conflicts on this file because it’s anyone who’s contributing a plugin has to edit it, so it can get a little bit of a pain in the neck. So just make sure this is the last thing that you do when you’re making a plugin. But when you’re testing locally, this is required to make it work, so.
Jack Zampolin 20:16.977 Next, we’re going to write the sample config function. So in that sample config, you need to return any configuration variables that you’re going to be using. So we’re going to be using one called amplitude, and you can see that example there. And as we talked about earlier, Telegraf will dynamically construct that configuration file, and this is where that comes from. Of note here, when you’re doing the spacing here, this needs to be two spaces, not a tab, otherwise, the spacing will look incorrect when you generate the config file. You’ll notice it right away. So that’s one that had me stumped the first time I tried this, but is a very easy fix. So here I’ve implemented it. I’ve just put that Trig config in a variable. You can also just return the explicit string down there if you’d like as well.
Jack Zampolin 21:28.612 Next, we’re going to write a description. So our Trig plugin inserts sine and cosine waves for demonstration purposes. Once you’ve done that, you can actually run the plugin. That gather function returns nil at this point, but the code will run. So we can generate our sample config here. So if we say -sample-config -input-filter trig, so our new plugin, -output-filter influxdb, and it will generate a lot. You’ll notice a lot of text there. There’s some standard agent config and a bunch of other things, but you should see down at the bottom inputs.trig and then the amplitude, what we just wrote there as well as the human-readable description.
Jack Zampolin 22:29.291 So with that configuration parameter that we’ve added in the config, we need to have a place in the Trig struct for it to live. So we’ve called it lowercase A, amplitude. In Go, you need to have uppercase letters in your struct properties in order to have those exported and available to outside programs like Telegraf from your plugin. So we’re going to name that capital A, amplitude. And that naming does need to be consistent. If you’re looking to use dashes or underscores, look at the way other Telegraf plugins are implemented, and you’ll see that that’s very easy to implement as well.
Jack Zampolin 23:20.663 So if you have any other variables that you want to use in your Trig struct or in your plugin struct, those need to be unexported. So if you’ll notice, we have an unexported X there as well. That’s just lowercase. That won’t get pulled from the config file. That will get pulled from your program as you write it, so important to note there. And here we’re using that to hold state of the plugin between collection intervals.
Jack Zampolin 23:57.735 So this is where all the magic happens in that gather function. So the Gather function is at the heart of the plugin. It is run every time Telegraf collects metrics. So if you think of Telegraf just iterating over the list of plugins that are enabled on every collection interval, it’s just calling gather on each one of them. That’s what the interface is for, and that’s what makes this such a powerful agent because it is so simple. So that number is configurable. That’s how often the interval is, how often the gather plugin gets run. And then there’s another couple of options in that agent section, another one of them being flush interval. And that flush interval is how often you will write to your configured outputs. So if you turn the interval down to 1 second when you’re testing, but you only see it flushing to the database every 10 seconds, that’s the configuration parameter you need to change. You can also write plugins to collect at independent intervals or in the case of a service plugin, like StatsD that we talked about earlier, that will continually listen for points. And then at the interval, Telegraf will scrape the points that it’s collected since the last interval and write those.
Jack Zampolin 25:29.326 So the function that we’re looking at here is in core Telegraf, and it’s telegraf.Accumulator.AddFields. This is a representation of a point in InfluxDB. It also goes to the internal representation that Telegraf has of these metrics. So that method takes a measurement, which is a string, tags, which is a map string string. So it’s a map with strings as keys and strings as values. And then fields, which is a map string interface. So a map that has string keys and anything as its fields. That defines the new point in InfluxDB, and the timestamp that it’s given is the timestamp at the collection interval.
Jack Zampolin 26:33.523 When you’re writing your gather function here, we’re going to generate a cosine wave. So we’ve imported the math package up top. We’re going to say math.Sin, and then notice that we’re using the state of the plugin, X, that changes every time to give you that variation. And then the height of it is going to be determined by that configuration parameter that we’ve put in amplitude there.
Jack Zampolin 27:09.555 We store those values in fields. So we make that map for the fields, and then we make keys for each sine and cosine and store them in there. Just to note, those fields will appear in InfluxDB in that format. So if you like your fields to be named something specific, make sure you name that here. We don’t have any additional tags that we’re putting in. If there was some metadata that you wanted to use to distinguish between different metrics, you would add your tags to that tags map there. One thing to note, if you generate too many different values for tags in InfluxDB, it can use up a lot of memory. That’s due to something called series cardinality. So it’s important to design good schema when you’re writing Telegraf plugins, and that’s one place where people easily mess up. One of the common errors there is if you’re adding a UUID for every point that’s collected, that can quickly generate a lot of series in Influx and lead to a runaway memory usage. So just make sure that you don’t do that.
Jack Zampolin 28:36.962 Then after that, we want to make sure that we’re updating the internal state of the plugins, so we’re moving things forward so that we [inaudible]. And then we’re [inaudible] and make our point. So Trig, which is the name of the measurement, it’s really good practice to name your measurement whatever the plugin name is, and that really helps avoid confusion. If you’re naming it something else, maybe the plugin name is Docker. So our Docker plugin, for example, will generate a few different measurements, [inaudible] container CPU, container memory. There’s a few different measurements in Docker, but they each have Docker [inaudible]. Just make sure that your naming here is common sense and something that anyone using the plugin would understand, and that’s generally using the name of [inaudible]. Just be sure to do that. And then we’re returning nil here. You can also return error [inaudible] again to pull an API. You can imagine making that HTTP request. You’re going to have to handle the error. If the error is fatal, return it, and that will get dumped to the logs for Telegraf.
Jack Zampolin 30:10.413 So one other thing that we need to do is add starting state. So you’ve noticed we’ve incremented the state there on every gather so that we can generate that cool sine wave. When you’re instantiating the plugin, so in that inputs.Add down there in the init function, you do need to add starting states. So just notice that X equals 0.0, and we’ve made those floats, both X and amplitude are floats. So again, this function packages up the whole plugin, that Trig struct, and all the methods defined on the Trig struct, and gives it to the Telegraf agent. And on each interval, it will iterate through all of those plugins.
Jack Zampolin 31:09.371 Now, it’s time to test your plugin. So to compile, again, just call make from the root of the Telegraf directory. We’ll generate our configuration again. I’m going to dump it in something called telegraf.conf.test. There is a convenient gitignore in the Telegraf directory for anything called .test, so that’s a nice little trick there. And then we’re going to run it with debug enabled. And you should see that run. So you’ll see the gathered metrics. Once you’ve done that, you can graph it with Chronograf. The query on InfluxDB would be select cosine, sine from the Trig measurement, and then just pull the last section of time. And you can see there the default amplitude we’ve given it as 10. So you can see the high of 10, the low of negative 10, and we’re getting that nice sine wave, which is pretty cool.
Jack Zampolin 32:22.960 So Aris just asks, “Is it possible to use a special input-output data format in our plugin? Otherwise, how hard is it to write a new input-output data format to be used by any plugin?” The answer there is we have the list of current ones. I think it’s, like, Graphite, JSON, InfluxData format, maybe one or two more. Basically, what you need to do is take what the—where are we here? Take what that accumulator add fields gives you. So you’ve got the measurement name, the field values, and those tags, and then you can take that and output that in whatever format you want. It is a relatively simple thing to do to write a new output data format. The input data formats are a little bit more involved, but because the output data format, you’ve got a very specific list of resources, you just transform those and do that. So not too hard. I hope that answers your question, Aris. Aris, if I’ve mispronounced that, I’m sorry about that. Yeah.
Jack Zampolin 33:58.633 All right. Robert asks, “Is there any way to provide a timestamp on the adds function?” As I mentioned earlier, the timestamp will be generated at the gather function time. I don’t believe there is a way to add a timestamp on the add fields function. Matt asks, “I’m working on an output plugin that writes to various AWS services, awssdk.go.” Nice. A great library. “My clients need to use an HTTP proxy to access these endpoints. Does Telegraf provide a way to use proxies for AWS Http-based endpoints?” [Rates?] to various AWS services. I mean, depending on the proxy implementation, it should just pass it through. That would be more of a question for the AWS SDK folks, i.e., can you use those methods with a proxy? Like, when you instantiate the client for AWS, how do you configure that for a proxy? So I don’t really know the best way to answer that. I’m sorry, Matt, but hopefully that helps.
Jack Zampolin 35:29.366 Okay. All right. Finally, last thing we need to do is write some tests. So if you want to contribute your plugin back upstream, one of the requirements is that you have tests for your plugin. We try run these tests. Well, we do run these tests every time we do builds on every push. So it looks for regressions. This is very important, so make sure you do this. In Go, tests are very easy to write, and native to the language. You just write methods that pass in the testing struct and you use that to assert your things for your different plugins.
Jack Zampolin 36:26.496 Here, we need to make a new instance of our Trig plugin and give it some initial parameters. You’re not going to be generating config here, so you have to explicitly pass in those values that you would normally get from the config. And then here, we’re just going to run the functions here about 10 times. Notice, we need to use the Telegraf testutil accumulator. This is the testing accumulator equivalent to the accumulator in core Telegraf. So we’re going to use those same functions to generate our dummy data here. We’re going to call the gather function with the accumulator, and then check to see if those values are the same. You’re going to have your own custom logic here for whatever your plugin is. Maybe you’re going to make an HTTP call or some other REST-based call. Maybe you’re mocking up some JSON response and comparing that against the results from a remote API. There’s a number of different ways to do testing, and I would encourage you to look at an article on Go testing for whatever your specific plugin does. If you’re writing a service plugin, one that exposes a server, you might want to send it some test data and then make sure it generates the correct fields.
Jack Zampolin 38:21.424 So we’re going to use another function from that testutil up there. The accumulator has assert contains fields and assert contains tags if you’ve generated some tags. I would encourage you to look at the documentation on that if that’s at all confusing. You pass in that testing function, that testing struct, the measurement name and then the fields. So once you’ve done that, you need to create a pull request on Telegraf. You need to have a README in your plugin, a LICENSE file in your plugin if that’s applicable. In that README, you need to include samples of the input or output format that you’re writing for.
Jack Zampolin 39:20.020 That wraps up our presentation today. I have time for any questions. I’ll stick around here for another 15 minutes or so, and answer your questions. And again, that’s questions on writing Telegraf plugins. I know this only covered inputs, but if you have any questions about other aspects of Telegraf, I’d be happy to answer those. I’m also happy to answer any questions on InfluxDB, Kapacitor, or Chronograf as well. So thank you very much for your time, and looking forward to your questions.