In this webinar, Regan Kuchan will describe how to setup InfluxDB & Telegraf to pull metrics into your InfluxDB. She will provide an introduction to querying data with InfluxQL.
Watch the webinar “Intro to InfluxDB & Telegraf” by clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “Intro to InfluxDB & Telegraf.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Regan Kuchan: Technical Writer, InfluxData
Regan Kuchan 00:02.409 Good morning. Good evening for some of you. My name is Regan, and today I’m going to be introducing two important components of InfluxData’s TICK stack, our platform for time series data. The components make up the T and the I of the TICK stack. They are Telegraf and InfluxDB. When you’re working with time series data, you’ll want to get familiar with Telegraf, our plugin-driven agent for collecting, processing, aggregating, and writing metrics. And with InfluxDB, our Time Series Database built to handle high rate and query loads. You can use these two TICK stack components independently, or you can use both of them together to create a pretty powerful architecture for your real-time monitoring and analytic setup. To give you an idea of what I will be covering today, I’ll be going into detail about what InfluxDB is and what Telegraf is. And I’ll describe how you can use these two products together. I’ll also go into detail about using these products.
Regan Kuchan 01:03.911 First I’ll introduce them using Chronograf. Chronograf is InfluxData’s open-source web application that serves as a user interface for the TICK stack. It’s currently in beta, but I think it’s a great way to first get to know Telegraf and InfluxDB. Second, I’ll get into more detail about using Telegraf and InfluxDB without Chronograf. So that section of the presentation will provide a more fine-grained detailed insight into what goes into using Telegraf and InfluxDB.
Regan Kuchan 01:37.356 Let’s start with what these two components are and how they fit together in your monitoring setup. First, we have Telegraf. Telegraf is InfluxData’s open-source agent. It’s written in Go, and it’s designed to collect metrics using input plugins. It can process and aggregate those metrics, and it can also write those metrics to output plugins. Basically, Telegraf takes in data, does stuff to them, and then outputs those data to where you want to store them. Telegraf supports over 70 input plugins and over 20 output plugins, so there are a lot of options when you’re working with Telegraf.
Regan Kuchan 02:17.506 Next, we have InfluxDB. InfluxDB is InfluxData’s time series database. It specializes in working with time series data, that is, data that occur over time. InfluxDB is written in Go, and it’s built from the ground up to handle high write loads and high query loads. Before I go any further, I’d like to mention that I’m going to be working with the open-source version of InfluxDB. There is a closed-source version of InfluxDB that offers high availability and clustering. Most of what I do talk about today, however, does apply to that clustering version of InfluxDB as well. Okay. So how do Telegraf and InfluxDB fit together? Well, Telegraf collects the time series data, and InfluxDB is one of its output plugins. So Telegraf writes the data that it collects to your InfluxDB instance. Here’s a practical example of how Telegraf and InfluxDB work together. We have Telegraf configured to collect data about our CPU usage. It turns those data into line protocol format. That’s the accepted text-based format for writing points to InfluxDB. And it writes this data to InfluxDB. Whilst there in InfluxDB, you can query the data about your CPU usage using InfluxDB’s SQL-like query language. You can manage your data so they’ll only keep the data that you’re interested in. You can aggregate the data, and so on. And you can use something like Chronograf or Grafana to visualize the CPU usage data that are now stored in InfluxDB.
Regan Kuchan 03:56.407 Now that you know what Telegraf and InfluxDB are and how they fit together, let’s start to use them. As I said earlier, I’m going to introduce them using Chronograf. Chronograf is currently in beta, and it offers a user interface for working with the TICK stack. I’m not going to cover everything that Chronograf can do. I’m only going to focus on how Telegraf and InfluxDB fit in with this user interface. If you’re interested in learning more about Chronograf, I recommend checking it out on GitHub. By the end of this session, you will have set up Telegraf to write system to statistics to InfluxDB, and you’ll be able to interact with those data in Chronograf. Throughout this presentation, I’m going to assume that you’re on an Ubuntu 16.04 system, but these steps are applicable to most [inaudible] in Linux. I should also note that I’m doing all of this on the same machine. If you plan on setting these products up on separate machines, you’ll need to do a little bit more configuring on your end.
Regan Kuchan 04:55.928 Okay, so let’s start out with installing InfluxDB. Looking at the steps on the slide, we download InfluxDB, install it, and then start it. Pretty straight forward. For our purposes, we don’t need to make any changes to InfluxDB’s configuration file. The default settings will work just fine. Next, we download, install, and start Telegraf. Again, Telegraf’s default configuration file works fine for what we’re doing. Telegraf system input plugin is enabled by default in its configuration file. It collects information about your system’s CPU, memory, disk usage, and other system level data. Telegraf’s default output plugin is InfluxDB. So it automatically formats and writes the collective points to InfluxDB.
Regan Kuchan 05:51.513 The third step is to download, install, and start Chronograf. And these steps are listed on this slide. Next, visit local host 8888 in your browser. Note that you’d need to update local host if you’re not running Chronograf locally. Assuming that everything is up and running properly, you’ll see this sign-up form. All you need to do is fill out the connection’s string input for your InfluxDB instance, assuming you’ve kept all of the defaults and you’re running it locally, local host at port 8086 is the right point to enter. Next, name your InfluxDB instance, and finally, enter the database that’s storing your Telegraf data. By default, Telegraf creates a database called Telegraf and stores all of its data in that database. Notice that here, you don’t need to fill out the username and password inputs, because InfluxDB doesn’t have authentication enabled by default. When you’re done, click Connect New Source and you’ll be inside the Chronograf web application. Like I said earlier, I won’t be going into everything that Chronograf can do. That’s an entire webinar of its own. But I will be talking about how Telegraf and InfluxDB fit into Chronograf.
Regan Kuchan 07:17.116 The first page that you see in Chronograf is the host list page. The first thing to notice is the little icon in the top right corner. That’s the InfluxDB instance that Chronograf is communicating with. Next, let’s look at the host names on the page. These host names come from Telegraf data. When Telegraf writes data to InfluxDB, it automatically tags those data with the host name of the server that they were collected from. If you ran through the setup steps on the previous slide with me, and you’re running Telegraf on your local machine, you should see just one host in the host name list. That host name is your local machine’s host name. On this side, I have two host names, where the chronobirds are and where the chronowilds are. That means I have two Telegraf instances that are writing data to my single InfluxDB instance.
Regan Kuchan 08:13.769 All right. The next thing to note is the apps column on this page. The apps column lists all of the configured Telegraf input plugins. We only have one configured. That’s the system input plugin. If you click on that one plugin, it’ll take you to a pre-configured dashboard for the systems statistics input plugin. All of those data in the dashboard were collected by Telegraf and then written to InfluxDB. Chronograf just turns this data into the visualizations that you see on this slide. You can do some cool things to this system stats dashboard in Chronograf. You can zoom in on the graphs and pan the graphs. You can configure the auto refresh interval to your liking. You can change the time interval displayed in the graphs, and you can change the view to presentation mode. That just gets rid of the left side bar and leaves you with just the graphs for display purposes.
Regan Kuchan 09:15.163 Okay. Back to Telegraf and InfluxDB and Chronograf. This slide shows Chronograf’s data explorer. The data explorer allows you to create a Telegraf data that are stored in InfluxDB, and allows you to create visualizations from this data. The section at the bottom of the image is the query folder. You can see that it displays the information necessary for creating queries based on Telegraf systems statistics data. In this image, I’ve already created a query that selects the CPU usage idle data that occurred within the past 15 minutes, then I grouped the results by each host name that I have. If you remember from the host list page, I had two host names. So my graph has two lines on it. My query on the slide is an InfluxQL format. InfluxQL is InfluxDB’s SQL-like query language. If you haven’t been exposed to InfluxQL, have no fear. I will be giving an overview of the query language and more in the next section.
Regan Kuchan 10:22.438 So now you’ve gotten a user interface introduction to Telegraf and InfluxDB. Now I’m going to go into more detail about InfluxDB. In the next slides, I’ll talk about InfluxDB’s data model, how to format points, divide them to InfluxDB, how to query data in InfluxDB, and finally, I’ll give an overview of InfluxDB’s data management tools. One of the first things that we recommend new users to learn about is the InfluxDB data model. The data model is just how data are organized in InfluxDB. On this slide, I have a sample of Telegraf system stats data in [inaudible] format. I’ll be walking through each component of that table and describing how it fits into the data model. So, actually, the first item I want to talk about is not the table. The first item is a database. A database is a logical container for your time series data, user’s retention policies, and continuous queries. I will cover what those things are in a second, but just know that everything in this table on the slide belongs to a database. Also note that a single InfluxDB instance can have several databases.
Regan Kuchan 11:44.401 Next, I’m going to talk about measurements. In this table CPU is the measurement. Measurement is a part of InfluxDB’s data structure that acts as a container for time stamps, fields, and tags. The measurement name is always stored as a string, and for any SQL users out there, a measurement is similar to a table. The next part of InfluxDB’s data model is the tag key. In this case, host is the only tag key in the CPU measurement. Tag keys are always stored as strings. The tag key is associated with a tag value or several tag values. In this table, the tag key host has two tag values, where the chronobirds are and where the chronowilds are. Like tag keys, tag values are always stored as strings in InfluxDB. Unique tag key value pairs make up one tag. In the table on the slide, there are two types. The first is where host equals where the chronobirds are. And the second tag is where host equals where the chronowilds are. Tags are an important part of InfluxDB’s data model. They store metadata about individual data points and tags are indexed. This means that queries on tags are faster than queries on fields. I’ll get to fields in just a second. Just remember that when you’re thinking about how to store your data in InfluxDB, we recommend storing commonly-queried metadata in tags.
Regan Kuchan 13:22.874 The next component of InfluxDB’s data model is the field key. In this case, usage idle is the only field key in the CPU measurement. Field keys are always stored as strings. The field key is associated with the field value or several field values. In this table, the tag key usage idle is associated with four floats in that column. Field values can be floats, booleans, strings, or integers. Unique field key value pairs make up one field. In the table on the slide, there are four different fields. The first is where usage idle equals 97.085, and the fourth is where usage idle equals 98.5. As I mentioned before, fields are not indexed. So when compared to queries on tags, queries on fields are not as performant.
Regan Kuchan 14:22.488 Next, I’ll focus on the timestamp. Every measurement that you have in InfluxDB has a time column. InfluxDB is a Time Series Database. If you’re a SQL user, time is essentially always the primary key. In InfluxDB, timestamps are always in Universal Coordinated Time or UTC. Now we’re getting to one of the most important parts of the InfluxDB data model, the series. A series is made up of the unique measurement tag key value pairs within a database. In this table, we have two series. The first is where the measurement is CPU, and the tag is host equals where the chronobirds are. The second series is where the measurement is CPU and the tag host equals where the chronowilds are. Understand series, especially how many series you have and how many you expect to have, is really important when you’re working with InfluxDB. If your series cardinality, so the number of unique series of InfluxDB instance, is extremely high and is expected to keep growing, you can end up with memory issues. InfluxDB maintains an in-memory index of every series in the system. If your series cardinality gets high enough, the operating system may kill the InfluxDB process with an out-of-memory exception. This is something to remember and keep an eye on as you work with InfluxDB.
Regan Kuchan 15:53.430 The last part of the data model is a point. A point is the collection of fields with a unique series and typeset combination. In this table, we have four different points. The green circle identifies points by their series and time stamp. The series is made up of the CPU measurement. The tag host equals where the chronobirds are and the time stamp for March 15th at 08:11 PM UTC. Notice that the field key is not part of the definition of a point. A point can consist of several field keys and their values with the same time stamp. Okay. Now that you know about InfluxDB’s data model, we need to know how to represent that model textually. When you write data into InfluxDB, you use a specific text format called line protocol. Here’s an example of line protocol using the data that we saw in the previous slide. The CPU measurement comes first, followed by a comma and the tag. That’s host equals where the chronowilds are. Then there’s a white space followed by the field, that’s usage idle equals 97.085. Finally, there’s another white space in the time stamp in epoch nanoseconds. That’s the number of nanoseconds that have elapsed since midnight, January 1st, 1970 UTC. Note that line protocol is white space-sensitive. Any unintentional white spaces can denote parsing errors when writing data to InfluxDB.
Regan Kuchan 17:37.342 This next example shows how to include several tags in several fields in line protocol. The two tags in this case host equals A and location equals B, occur right after the CPU measurement and they are comma separated. The multiple fields, in this case, usage idle equals 97.085 and value equals 2i occur after a white space and they’re also comma separated. The i after the 2 indicates that 2 is an integer. InfluxDB assumes that all numerical field values are floats unless you include a trail of i on a value. The last example on this slide serve to show, excuse me, the minimum required items of line protocol. Line protocol must always include the measurement name, that’s CPU, and line protocol must always have at least one field. That’s value equals 2. Tags are optional. Not all data in InfluxDB have to have tags. And the time stamp is also optional. If you write the point InfluxDB without the time stamp, the system automatically assigns your local server’s time stamp in UTC to the point.
Regan Kuchan 18:55.500 There are several ways to write points to InfluxDB including the HTTP API, several client libraries Telegraf, which takes care of all the line protocol formatting for you. And InfluxDB’s command line interface or the CLI. Here I use the CLI to write three points that we saw in a previous slide to InfluxDB. The CLI is an interactive shell for writing data and querying data. We use it throughout the documentation and it’s a great way to get started writing and querying data in InfluxDB. It comes with every installation of InfluxDB. On this slide, I connect to the CLI with the Influx command. I create a database called Test. I use that database so that every write targets the test database. And then I write the three points by simply placing “INSERT” in front of the line protocol. Notice that the CLI doesn’t return anything if it successfully writes the point. You’d see a parsing error if the system couldn’t understand your line protocol. The last command on the slide queries back the data that you just wrote. You can see that InfluxDB is now storing those three points. The SELECT statement that you just saw is an example of InfluxQL. InfluxQL is InfluxDB’s SQL-like query language. In this slide, I’ll give a short overview of what InfluxQL can do for you. We have extensive documentation and video units about our query language off the website if you’d like to know more.
Regan Kuchan 20:33.654 First off, InfluxQL has a suite of Schema Exploration queries. Schema Exploration queries allow you to see how your data are organized in InfluxDB’s data model. They include queries that show databases, show series, and show measurements. There are more and they are generally recognizable by having SHOW at their start. Next are the data exploration queries. These queries are like the SELECT statement that you saw in the previous slide. They allow you to select specific fields and tags from measurements. They support using aggregation functions, selector functions, transformation functions, and predictor functions at fields. Next, they allow you to filter your data based on specific tag key values, field key values, and time intervals. And finally, they allow you to group query results by specific tags and into specific time intervals.
Regan Kuchan 21:39.301 The last set of InfluxQL queries are for managing your data. They allow you to do things like create databases and retention policies. More on those in just a second. They allow you to drop databases and retention policies that you no longer want or need, and they allow you to delete data. This was a very brief insight into InfluxQL just to show the range of what it can do. Please do check out the documentation at this link if you would like some more information. Other features of InfluxDB that I’ve touched on are on the database management side of things. InfluxDB has retention policies. Retention policies describe how long InfluxDB keeps data, how many copies of the data exist. Note that this point only applies to the closed source clustering option. And finally, retention policies determine how the system handles shards. You can think of shards as files that store data that fall within a specific time range. InfluxDB also has continuous queries. Continuous queries are InfluxQL queries that run automatically and periodically on your data. And they store the results of those queries on InfluxDB. You can use continuous queries and retention policies to downsample data, that is, take high-precision data, like one-second resolution data, and aggregate this data to a lower precision, like one-hour resolution.
Regan Kuchan 23:10.472 And then once you downsample, you can configure the data used to automatically expire the now unnecessary high-precision data. Doing this saves space and increases the efficiency of your InfluxDB instance. Downsampling is more of an advanced topic and we do go into detail about it in a later webinar, but it’s good to know that it does exist. Moving on to the Telegraf portion of this talk. In the next two slides, I’ll discuss some of Telegraf’s plugins and touch on how to start working with Telegraf’s configuration file, and some of the more advanced options for data management. Telegraf input plugins collect metrics from the system, services, or from third-party APIs. As I mentioned earlier, there are over 70 available input plugins for Telegraf. And, in addition, if none of those work for you, you can write your own input plugins. I’ve listed just a few of the available input plugins on this slide. You’ve already seen the system plugin, which collects information about your system’s CPU, memory, disk usage, and other system level data. There are also input plugins for StatsD, Exec, Socket Listener, Procstat, PostgreSQL, Kubernetes, and MySQL. The Telegraf GitHub repo includes a complete list of input plugins as well as [inaudible] needs for each one.
Regan Kuchan 24:39.142 Telegraf’s output plugins write the corrected metrics to your destination of choice. As I mentioned earlier, there are over 20 available output plugins for Telegraf. You’ve already seen that InfluxDB is an output plugin. AWS CloudWatch and Graphite are also available output plugins. The Telegraf GitHub repo includes the complete list of all output plugins as well as reboots for each one. When you use Telegraf, you’ll end up getting comfortable with its configuration file. Here I have the part of the configuration file that handles the InfluxDB output plugin. As you can see, you can configure the InfluxDB instance to target. You can configure the name of the database to write to, the retention policy to write to. The default Telegraf rights to retention policy that keeps the data forever, and there are other configuration options.
Regan Kuchan 25:36.604 Every output plugin is very configurable. Next, here’s an example of the CPU part of the system’s 86 input plugin. The first configuration option allows you to disable Telegraf and report in-stats per CPU. You can disable Telegraf from reporting total CPU stats, and you can set that last option to true to collect raw CPU time metrics. Again, this just serves to show that Telegraf’s configuration file is important. Everyone’s setup will be somewhat tailored to their needs. You can find out more at this link, which is the documentation for the configuration file.
Regan Kuchan 26:24.421 The last item that I’d like to touch on about Telegraf is data management with Telegraf. Telegraf’s aggregator and processor plugins sit between the input and output plugins, and they manipulate metrics as they pass through Telegraf. Processor plugins process metrics as they pass through and immediately emit results based on the values they process. For example, processor plugins can add a tag to all metrics they pass through. Aggregator plugins emit new aggregated metrics, like an average or a minimum value. They process specific time interval. Users can choose if they want to keep the unaggregated metrics in addition to the aggregated metrics. Again, aggregator and processor plugins are configured and tailored to your views in the configuration file. That’s it for today. If you’re interested in learning more, please do check on our documentation where we have a lot of videos about using InfluxData’s platform on the website. If you have any questions when you’re working with our products, please don’t hesitate to post in the InfluxData community. Thank you for listening.