Category Archives: Chronograf

Beta3 of Chronograf


As promised, we released an update to the Chronograf beta that includes bug fixes and some updates to the remaining features. In particular, we have added the ability to create your own queries outside of the query builder. This is useful if you want to build queries in time intervals outside of the standard set or if you prefer to type in your queries manually.

Continue reading Beta3 of Chronograf

Now in Beta: Chronograf a complete open source monitoring solution running on the TICK Stack


Today we’re announcing the latest edition of Chronograf, the user interface of the TICK stack and moving the project to beta status. Over the past month, we have been quickly iterating on features and addressing issues based on user feedback.  Key highlights include:

  • OAuth authentication via GitHub
  • Application templates for ElasticSearch, Varnish, and 22 other templates
  • Responsive design for the host view page
  • A number of other smaller bug fixes… refer to the change log for more details

As a part of the beta release, we have recorded a video which walks through the key capabilities of Chronograf in less than 5 mins. Continue reading Now in Beta: Chronograf a complete open source monitoring solution running on the TICK Stack

Announcing the new Chronograf, a UI for the TICK stack & a complete open source monitoring solution


Today we’re releasing the first open source version of Chronograf, the user interface of the TICK stack. With this release we can now provide the entire stack as a complete open source monitoring solution. It’s part of our vision to enable users to own their monitoring on pure open source software that is as easy to setup and use as commercial SaaS offerings. It’s a continuation of our two primary drives as we build software: optimize for developer happiness by giving our users the fastest time to value with tools and solutions that are a joy to use.

Continue reading Announcing the new Chronograf, a UI for the TICK stack & a complete open source monitoring solution

Announcing InfluxDB, Telegraf, Kapacitor and Enterprise 1.0 RC1


InfluxData team is excited to announce the 1.0 release candidate (RC1) of the Telegraf, InfluxDB, Chronograf, Kapacitor projects and InfluxEnterprise, our clustering and monitoring product. These projects have been years in the making, so are pleased to have finalized the API and all the features for the 1.0 release. From now until the 1.0 final release we’ll only merge bug fixes into the 1.0 branch.

Robust & Stable APIs

We’ve already done extensive testing on both the single server and clustered versions of InfluxDB, so these releases should be considered very stable. We’ll be rolling out the 1.0 RC to all of our customers in InfluxCloud over the next week or two.

Purpose Built for Time-Series

InfluxDB 1.0 will be shipping with our purpose built storage engine for time series data, the Time Structured Merge tree. It gives incredible write performance while delivering compression better than generalized solutions. Depending on the shape of your data, each individual value, timestamp pair can take as little as 2 bytes, including all the tag and measurement metadata. Continuous queries and retention policies give users the ability to have the database manage the data lifecycle automatically by aggregating data for longer term trends while dropping older high precision data.

Continue reading Announcing InfluxDB, Telegraf, Kapacitor and Enterprise 1.0 RC1

Announcing InfluxDB 1.0 Beta – A Big Step Forward for Time-Series


The team at InfluxData is excited to announce the immediate availability of InfluxDB 1.0 Beta plus the rest of the components in the TICK stack: Telegraf, Chronograf, and Kapacitor. While there are many new features like exponential smoothing via Holt Winters queries, templates for Kapacitor TICKscripts, Telegraf Cloudwatch integration, and dozens of bug fixes, this release marks a significant point in the development of these projects.

We’ve had customers and community members running the TICK-stack in production at significant scale for months and we are therefore comfortable that the quality of the codebase is worthy of the 1.0 moniker. Second, we’re ready to lock down the API and make a commitment to zero breaking changes for a significant length of time. This is especially important for organizations building products and services on top of the InfluxData stack whose products may have longer development cycles or require a higher degree of stability from the code base to ensure continuity for their customers and users.

Getting to 1.0 GA

This release is the first Beta of the upcoming 1.0 GA release. We still have some known bugs to fix, but from here until 1.0 we’ll be focused on testing, benchmarking and bug fixes. What about new features? They’ll come in the point releases after 1.0. For community members, this Beta is what you should be testing against. For some users, the Beta may even be suitable for production use. Many fixes have gone into all the projects since the 0.13 release nearly 4 weeks ago. Continue reading Announcing InfluxDB 1.0 Beta – A Big Step Forward for Time-Series

Announcing InfluxDB v0.13 Plus Clustering Early Access


The team at InfluxData is excited to announce the immediate availability of v0.13 of the TICK stack!

What’s the TICK stack? It’s InfluxData’s end-to-end platform for collecting, storing, visualizing and alerting on time-series data at scale. Learn more.

We are also pleased to announce early access to InfluxEnterprise. If you have been waiting for InfluxDB clustering to run on your infrastructure with the ability to rebalance nodes, plus a slick UI to deploy, manage and monitor the deployment, InfluxEnterprise is for you.

enterprise_gif

Like what you see? Contact us to request a demo, get early access plus pricing from one of our solutions architects.

InfluxDB v0.13

v0.13 is a significant InfluxDB release delivering robust features and stability to the single server experience. This is also our final major milestone before we launch 1.0. New in this release:

  • Upgrade to Go v1.6.2 delivering performance gains
  • Support for DELETE FROM
  • Time Zone Support – offset argument in the GROUP BY time(…) call

There’s a total of 27 new features and 32 bug fixes! Detailed release notes can be found here and downloads here.

Telegraf v0.13

New in this release:

  • New Input Plugin: “filestat” to gather metrics about file existence, size, etc.
  • HAProxy socket support
  • Support for Docker container IDs to track per container metrics

There’s a total of 17 new features and 21 bug fixes! Detailed release notes can be found here and downloads here.

Chronograf v0.13

New in this release:

  • Design update and performance improvements to the visualization index pages
  • “Visualization cards” have been replaced with more performant list items

chrono
Downloads are here.

Kapacitor v0.13

InfluxDB’s native data processing and alerting engine now supports BYOA (bring your own algorithm) analytics. UDFs written in languages of your choice running in Docker containers can be launched independent of Kapacitor by exposing a Unix socket. On startup Kapacitor will connect to the socket and begin communication, applying the UDF logic to the time series data being processed.

New in this release:

  • UDFs can be connected over a Unix socket. This enables UDFs from across Docker containers to be applied to the Influx data pipeline.
  • CLI features to ID, list and delete tasks, recordings and replays
  • API lock down to a stable 1.0 version

There’s a total of 18 new features and 8 bug fixes! Detailed release notes can be found here and and downloads here.

But wait, there’s more…

FREE Technical Papers

Today we also launched a new “Technical Papers” landing page on influxdata.com where developers and architects can download long reads that dive deep into a variety of InfluxDB related topics. Right now we’ve got three papers posted:

Make sure to check back often as we’ll be uploading new papers every other week!

FREE Virtual Training Summer Schedule is Live!

We’ve just added eight sessions to the virtual training schedule. In addition, we’ll be debuting three new topics including:

  • Benchmarking InfluxDB and Elasticsearch for Time-Series Data Management
  • Migrating from InfluxDB 0.8 and Up to 0.13
  • Intro to Kapacitor for Alerting and Anomaly Detection

What’s Next?

The Chronograf Files: The Curious Case of JavaScript’s `sort`


“The world is full of obvious things which nobody by any chance ever observes.”
— Sir Arthur Conan Doyle, The Hound of the Baskervilles*

It was your average day at the InfluxData offices. The usual sounds filled the air: the clack of a ping pong ball, the ever-present drip of an overworked coffee pot. A day just like any other — or so I thought.

It all started when a new Chronograf ticket came in, where a user had reported their graphs behaving strangely.

The Bug

I give you ‘Exhibit A’, a screenshot submitted by our affected user:

chrono-out-of-order

Though this graph is quite the visual cacophony, the issue was clear immediately: Chronograf was plotting this user’s data out of order. For a product like Chronograf, this is about as critical as a bug can be. Because we’re in the business of time series data, we’d better be sure we’re respecting the laws of physics and that time continues to function as it has since the beginning of, well, time.

At this point, Chronograf had seen several releases over the better part of a year. How had we not seen behavior like this before?

The user also provided us with the sample dataset that resulted in the above graph, hereby dubbed ‘Exhibit B’. The dataset itself was large so I won’t reproduce it in full, but here is a small sample we’ll use for investigative purposes:
`timestamp value
——— ——
100000000 4
200000000 9
300000000 6
400000000 7
500000000 6
600000000 4
700000000 9
800000000 7
900000000 10
1000000000 6
1100000000 4
…`

Note: InfluxDB supports multiple time formats, though in this case we’re using Unix time. Not to spoil too much, but this will be highly relevant soon!

At first glance, the data seems reasonable. The values seem OK, and the timestamps are ordered chronologically. No cause for alarm, right?

The Fix

“Wait, you said `sort` does WHAT??'”
— Anonymous

Now I had exhibit A, a graph which explained how the bug affected the end-user. And I had exhibit B, a sample dataset I used to accurately reproduce the issue.

The next step was to ask myself a few questions which I’ve found helpful when beginning the debugging process:

1) What sections of the code does this bug touch? Who is the most likely offender?

I immediately thought of a piece of code that converts raw InfluxDB responses, which use JSON formatting, into data structures that our graphing library understands. As far as I could tell, the sample dataset was ordered correctly. It had to be something farther down the line.

2) Are there any obvious anomalies?

As far as plotting data chronologically, Chronograf seemed to function properly in all other known cases. Given the graph we saw earlier, it’s also the kind of bug you feel like you’d tend to notice.

Here is where it starts to get good: the test data was noteworthy because the first point starts on day one of what Unix considers to be the dawn of the universe. Unix timestamps represent the amount of time (usually in seconds or milliseconds) that has passed since 00:00:00 UTC, January 1, 1970.

For example, the ‘Unix time’ as of me finishing this sentence is `1458851975430`. On its own, maybe this isn’t a particularly exciting or interesting detail. It’s test data, who cares if we’re plotting points from 1970 or 2016? We do, as it turns out!

I took a hard look at the data after it had gone through the conversion process, immediately before it made its way into our graphing library. I confirmed it was out of order. Using the same small subset of data from the last section, here’s a view of the test data as it appeared to me:

`timestamp value
——— ——
100000000 4
1000000000 6
1100000000 4
200000000 9
300000000 6
400000000 7
500000000 6
600000000 4
700000000 9
800000000 7
900000000 10
…`

Wait, what? Why on earth is `1000000000` (9 zeroes) appearing before `200000000` (8 zeroes)? This new insight was enough for me locate the perpetrator, a line that looked something like this:

`// allTimestamps is a list of unique timestamps,
// e.g. [100000000, 200000000, 300000000 …]
allTimestamps.sort()`

I remembered why this line was initially added: if for any reason the data came in to the converter out of order, or merging multiple series was required, we wanted to ensure that our data was chronological before trying to plot it on a graph.

And suddenly it all made sense. Let’s try something…

If you’re on a device with a keyboard, open your browser’s developer console. On Chrome you can choose View -> Developer -> JavaScript Console and type in the following and hit enter: `[2, 10, 5].sort()`

Does the result surprise you at all? I imagine it does, as this is the output you should have seen:

`[2, 10, 5].sort()
// => [10, 2, 5]`

It boils down to this: JavaScript’s built-in `sort` function, by default, sorts its items ‘lexicographically’, i.e. alphabetically. ‘1’ comes before ‘2’ in the alphabet according to JavaScript. From this perspective, the output we just saw makes sense. Is this the default behavior we expected when sorting a list of numbers? Not in the slightest.

The fix itself was simple. Instead of using `sort` without any arguments, we provided what Mozilla calls a `compareFunction`

If you ran the previous example in your browser’s console, give this one a try:

`[2, 10, 5].sort((a, b) => a – b)`

Better, right?

OK, now we now how `sort` works. We know what to fix, but we still haven’t fully answered our original question: Why did we have to wait so long for a bug like this to make an appearance? The answer lies in the number of significant digits for each timestamp. With the provided test data, we had multiple rollovers where the timestamps not only grew in numerical size, they grew enough to require another significant digit to represent them.

We can illustrate this by using the test data, and it only requires a few points:

`timestamp value
——— ——
800000000 4
900000000 6
1000000000 4`

Try sorting these timestamps without a `compareFunction` and see the results:

`[800000000, 900000000, 1000000000].sort()
// => [1000000000, 800000000, 900000000]`

`1000000000` appears at the beginning of the list, which is technically correct according to JavaScript. But because it has an extra digit, it results in non-chronological data. Most of the data being fed into Chronograf is recent, typically from the last several years. Enough time has passed since 1970 (the beginning of Unix time) to the point where it will be quite a while before Unix timestamps need another digit to be represented. If we’re using milliseconds, Nov 21, 2286 in fact.

During normal Chronograf usage, if InfluxDB was returning Unix timestamps, they were of the same length in virtually all cases. The fact that JavaScript was treating these numbers as strings when trying to sort them was irrelevant — we’d get the same results regardless of whether we told JavaScript to explicitly sort them as numbers.

One last experiment for the console, where the results should be identical:
`// Using Three realistic InfluxDB timestamps:
[1459444040000, 1459444050000, 1459444060000].sort()
[1459444040000, 1459444050000, 1459444060000].sort((a, b) => a – b)`

Chronograf was dealing with timestamps like this in virtually all cases, explaining why it took a unique dataset for us to track down this particular bug.

Voila! Another case in the books.

The Lesson

“History is not a burden on the memory but an illumination of the soul.”
— Lord Acton

As a developer, this experience was fantastic. It was a fun, unique, mildly infuriating bug with no shortage of teaching potential. Here were my big takeaways:

  • A deep understanding of your tools will make you a better *insert thing here*. This idea spans all fields and disciplines, though it is especially important in software. The `sort` bug was able to bite us because of a gap in our knowledge, and it serves as inspiration for learning even more about how JavaScript, our most valuable tool, works. It will help inform decisions we make in the future, and maybe even prevent a few bugs.
  • Critical bugs lurk beneath the surface, and often manifest themselves under the strangest of circumstances.

For us to finally see `sort` behave in a way we didn’t expect, we needed a dataset that used a specific time format and spanned over two very specific years, 1970-1972. I’ve found that software never ceases to amaze with the sheer multitude of ways in which it can break. In this regard I have a begrudging respect for it’s raw and unbridled creativity.

If you’re interested in a more in-depth look at how `sort` works under the hood, this article from javascriptkit.com was invaluable.

What’s next? Get Started With Chronograf!

Which now sorts numerical timestamps with aplomb!

If you’d like to try Chronograf, the ‘C’ in the TICK stack, you can download the latest version here. We love reading and appreciate any and all feedback, so let us know what you think. Whether it’s bug reports, feature requests, general experience improvements, or any other thoughts or ideas you have about the product, don’t hesitate to email the team: chronograf@influxdata.com.

Part 7: How-to Create an IoT Project with the TICK Stack on the Google Cloud Platform


Part 7 : Collecting System Sensor Data with Telegraf

The last part of this tutorial looks at Telegraf, the “T” in the TICK Stack. Telegraf is an agent that is used to collect metrics from various input channels and write them to output channels. It supports over 60 plugins which can function as the input source or output target of data.

The agent is completely plugin driven and while it supports multiple plugins off the bat, you can write your own plugins too.

Our tutorial so far has looked at collecting temperature data from multiple weather stations and persisting that in InfluxDB. In addition to that, we also looked at setting up Chronograf to view the temperature data via a dashboard and set up alerts via Kapacitor, that pushed notifications to Slack in case the temperature went over a certain limit.

At this point, the data is being collected via Raspberry Pi stations that are having the temperature data and the flow is pretty much in place. The area that we would look at utilizing Telegraf would be to monitor the CPU, Memory and other system parameters of the InfluxDB server.

  • Telegraf comes along with a input plugin named `system`. This plugin captures various metrics about the system that it is running on like memory usage, CPU, disk usage and more. We shall use this plugin to capture the cpu and memory metrics on the InfluxDB server.
  • The input metrics captures will need to be sent to an output system. In our case, we will push this data into InfluxDB itself. This will help us capture these metrics into an InfluxDB database on which we could potentially then build out dashboard and alerts too via Chronograf and Kapacitor. Sounds neat. The output plugin therefore will be InfluxDB.

The diagram below depicts what we are going to do:

tele1
Installing Telegraf

We are going to install Telegraf on the InfluxDB Server instance. Currently we just have one instance running in the Google Cloud and we will be setting it up on that.

As mentioned earlier, the VM runs Debian Linux and we can follow the steps for installing Telegraf as given at the official documentation site. Follow the instructions as given for installing the latest distribution of Telegraf as given below:

wget http://get.influxdb.org/telegraf/telegraf_0.10.2-1_amd64.deb

sudo dpkg -i telegraf_0.10.2-1_amd64.deb

Configuring Telegraf

We need to provide a configuration file to Telegraf. This configuration file will contain not just Agent configuration parameters but also the input and output plugins that you wish to configure.

There are a ton of plugins for both input and output that Telegraf supports and it does give a command to generate a telegraf.conf (Configuration file) that creates all the input and output plugin configuration sections. That is a useful thing to keep with you but not what we want for our need.

We will be using the following generic command to generate a Telegraf configuration file for us:

telegraf -sample-config -input-filter <pluginname>[:<pluginname>] -output-filter <outputname>[:<outputname>] > telegraf.conf

In our case, we have the following:

We generate a `telegraf.conf` as shown below:

telegraf -sample-config -input-filter cpu:mem -output-filter influxdb > telegraf.conf

Let us look at the key sections in the generated `telegraf.conf` file:

  • [agent] : This is the section for the Telegraf agent itself. Ideally we do not want to tweak too much here. Do note that you could change the frequency (time interval) at which the data collection is done for all inputs via the `interval` property.
  • The next section is one or more `outputs`. In our case, it is just `influxdb output` i.e. `[[outputs.influxdb]]`. Two properties are key here, urls and database. The urls property is a list of influxdb instances. In our case there is just one and we are running Telegraf on the same machine as the InfluxDB instance, so the endpoint is pointing to the InfluxDB API Endpoint at `http://localhost:8086`. Similarly, database property is the database in which the input metrics will be collected. By default it is set to `telegraf` but you can change it to another one. I will go with the default one.
  • The next sections are for the inputs. You can see that it has created the `[[inputs.cpu]]` and `[[inputs.mem]]` inputs. Check out the documentation for both cpu and mem inputs.

Starting Telegraf and collecting metrics

Let us start the Telegraf Agent now via the following command:

telegraf -config telegraf.conf

We could have pushed the generated `telegraf.conf` into `/etc/telegraf` folder and started it as a service, but for the purpose of this tutorial explanation here, this is fine.

On successful startup, it displays an output as shown below:

$ telegraf -config telegraf.conf
2016/02/15 04:36:39 Starting Telegraf (version 0.10.2)
2016/02/15 04:36:39 Loaded outputs: influxdb
2016/02/15 04:36:39 Loaded inputs: cpu mem
2016/02/15 04:36:39 Tags enabled: host=instance-1
2016/02/15 04:36:39 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"instance-1", Flush Interval:10s

Recollect that one of the properties for the Telegraf Agent was the interval property which was set to 10 seconds. This was the interval at which it will poll all the inputs for data.

Here is the output from several data collection intervals:

2016/02/15 04:36:40 Gathered metrics, (10s interval), from 2 inputs in 531.909µs
2016/02/15 04:36:50 Gathered metrics, (10s interval), from 2 inputs in 447.937µs
2016/02/15 04:36:50 Wrote 4 metrics to output influxdb in 3.39839ms
2016/02/15 04:37:00 Gathered metrics, (10s interval), from 2 inputs in 482.658µs
2016/02/15 04:37:00 Wrote 3 metrics to output influxdb in 4.324979ms
2016/02/15 04:37:10 Gathered metrics, (10s interval), from 2 inputs in 775.612µs
2016/02/15 04:37:10 Wrote 3 metrics to output influxdb in 7.472159ms
2016/02/15 04:37:20 Gathered metrics, (10s interval), from 2 inputs in 438.388µs
2016/02/15 04:37:20 Wrote 3 metrics to output influxdb in 3.219223ms
2016/02/15 04:37:30 Gathered metrics, (10s interval), from 2 inputs in 419.607µs
2016/02/15 04:37:30 Wrote 3 metrics to output influxdb in 3.159644ms
2016/02/15 04:37:40 Gathered metrics, (10s interval), from 2 inputs in 426.761µs
2016/02/15 04:37:40 Wrote 3 metrics to output influxdb in 3.894155ms
2016/02/15 04:37:50 Gathered metrics, (10s interval), from 2 inputs in 449.508µs
2016/02/15 04:37:50 Wrote 3 metrics to output influxdb in 3.192695ms
2016/02/15 04:38:00 Gathered metrics, (10s interval), from 2 inputs in 498.035µs
2016/02/15 04:38:00 Wrote 3 metrics to output influxdb in 3.831951ms
2016/02/15 04:38:10 Gathered metrics, (10s interval), from 2 inputs in 448.709µs
2016/02/15 04:38:10 Wrote 3 metrics to output influxdb in 3.246991ms
2016/02/15 04:37:30 Gathered metrics, (10s interval), from 2 inputs in 419.607µs
2016/02/15 04:38:20 Gathered metrics, (10s interval), from 2 inputs in 514.15µs
2016/02/15 04:38:20 Wrote 3 metrics to output influxdb in 3.838368ms
2016/02/15 04:38:30 Gathered metrics, (10s interval), from 2 inputs in 520.263µs
2016/02/15 04:38:30 Wrote 3 metrics to output influxdb in 3.76034ms
2016/02/15 04:38:40 Gathered metrics, (10s interval), from 2 inputs in 543.151µs
2016/02/15 04:38:40 Wrote 3 metrics to output influxdb in 3.917381ms
2016/02/15 04:38:50 Gathered metrics, (10s interval), from 2 inputs in 487.683µs
2016/02/15 04:38:50 Wrote 3 metrics to output influxdb in 3.787101ms
2016/02/15 04:39:00 Gathered metrics, (10s interval), from 2 inputs in 617.025µs
2016/02/15 04:39:00 Wrote 3 metrics to output influxdb in 4.364542ms
2016/02/15 04:39:10 Gathered metrics, (10s interval), from 2 inputs in 517.546µs
2016/02/15 04:39:10 Wrote 3 metrics to output influxdb in 4.595062ms
2016/02/15 04:39:20 Gathered metrics, (10s interval), from 2 inputs in 542.686µs
2016/02/15 04:39:20 Wrote 3 metrics to output influxdb in 3.680957ms
2016/02/15 04:39:30 Gathered metrics, (10s interval), from 2 inputs in 526.083µs
2016/02/15 04:39:30 Wrote 3 metrics to output influxdb in 4.32718ms
2016/02/15 04:39:40 Gathered metrics, (10s interval), from 2 inputs in 504.632µs
2016/02/15 04:39:40 Wrote 3 metrics to output influxdb in 3.676524ms
2016/02/15 04:39:50 Gathered metrics, (10s interval), from 2 inputs in 640.896µs
2016/02/15 04:39:50 Wrote 3 metrics to output influxdb in 3.773236ms
2016/02/15 04:40:00 Gathered metrics, (10s interval), from 2 inputs in 491.794µs
2016/02/15 04:40:00 Wrote 3 metrics to output influxdb in 3.608919ms
2016/02/15 04:40:10 Gathered metrics, (10s interval), from 2 inputs in 571.12µs
2016/02/15 04:40:10 Wrote 3 metrics to output influxdb in 3.739155ms
2016/02/15 04:40:20 Gathered metrics, (10s interval), from 2 inputs in 505.122µs
2016/02/15 04:40:20 Wrote 3 metrics to output influxdb in 4.151489ms

Since we have the InfluxDB Server running along with the endpoints for Admin interface, we can investigate the `telegraf` database from the Admin interface itself (you could have done that via the InfluxDB shell too!)

tele2
Here are some of the `cpu` measurement records:

tele3
Here are some of the `mem` measurement records:

tele4
As a next step, you could hook in visualization (Chronograf) or alerts (Kapacitor) into this Telegraf database.

Conclusion

This concludes the 7-part tutorial on using the TICK-stack from InfluxDB. The TICK-stack provides a best in class set of components to build modern and extensible solutions on a time-series database. We hope this tutorial gave you a glimpse into its potential and gets you started to create winning applications.

What’s next?

  • Get started with InfluxDB here.
  • Looking to level up your InfluxDB knowledge? Check out our economically priced virtual and public trainings.

Part 6: How-to Create an IoT Project with the TICK Stack on the Google Cloud Platform


Part 6 : Setting up Alerts with Kapacitor

In this part, we are going to take a look at Kapacitor, the “K” in the TICK stack. Kapacitor is a stream and batch processing engine, that is both a data processor and an alerting engine. In our case, we are going to specifically use it in the following way:

  • Define an Alert that monitors the temperature data and checks if it crosses a threshold of 30 degrees Celsius.
  • If the temperature reported is greater than that, we would like to log this record in a file and also raise a notification in our Slack channel.

The capabilities of Kapacitor are much beyond that and it comes along with a complex engine to detect data patterns and funnel that to multiple channels that it supports straight off the bat. In our case, logging the high temperature in a file and raising a notification via Slack are just a couple of integrations that it can do.

So let’s get started with setting up Kapacitor and seeing our temperature alert in action.

Installing Kapacitor

We are going to run Kapacitor on the same instances as our InfluxDB instance. This instance is running on the Google Cloud, so the best way to install this software is by SSH’ing into the instance.

To set up Kapacitor into our VM instance (the instance running InfluxDB), we will need to SSH into the instance. Follow these steps:

  • Login to Google Developers Console and select your project.
  • From the sliding menu on top, go to Compute –> Compute Engine –> VM Instances
  • You will see your VM instance listed.
  • Look out for the SSH button at the end of the row.
  • Click that and wait for the SSH session to get initialized and set up for you. If all is well, you should see another browser window that will transport you to the VM instance as shown below:

k1
The next thing is to install Kapacitor and since this a Debian Linux that we had opted at the time of creating the VM, we can follow the steps for installing Kapacitor as given at the official documentation site.

wget https://s3.amazonaws.com/kapacitor/kapacitor_0.10.1-1_amd64.deb
sudo dpkg -i kapacitor_0.10.1-1_amd64.deb

On successful installation you will ideally have two applications that we will be using:

  • `kapacitord` : This is the Kapacitor daemon that will need to be running to process the data coming into InfluxDB.
  • `kapacitor` : This is the CLI that we will use to talk to the kapacitord daemon and setup our tasks, etc.

Generating a default Configuration file

Kapacitor is a powerful product with multiple configuration options that makes it challenging to create an initial configuration file. Hence to make it easier, we can take the help of the kapacitord application to help us generate a default configuration file.

We go ahead and generate a default configuration file : `kapacitor.conf` as shown below:

$ kapacitord config > kapacitor.conf

The Configuration file (kapacitor.conf) has multiple configuration sections including connection to InfluxDB, the various channels that one can configure and more.

Here are few configuration sections of interest in the `kapacitor.conf` file:

  • `[http]` : This is the API Endpoint that kapacitord will expose and which the Kapacitor client will communicate to.
  • `[influxdb]` : On startup, Kapacitor sets up multiple subscriptions to InfluxDB databases by default. This section has various configuration properties for connecting to the InfluxDB instance. You will notice that it is a localhost url since InfluxDB instance is running on the same instance.
  • `[logging]` : This section has the default logging level. This can be changed if needed via the Kapacitor client.
  • `[slack]` : The section that we are interested in our tutorial is to get notified via Slack. The various properties include the channel in Slack that we want to post the message to, the incoming Webhook URL for the Slack Team, etc. We shall look at this a little later in this tutorial, when we set up the Slack Incoming Webhook Integration.

Start the Kapacitor Service

We do not make any changes to our `kapacitor.conf` file at the moment. We simply launch the Kapacitor Service as shown below:

$ kapacitord -config kapacitor.conf

This starts up the Kapacitor Service and you will notice towards the end of the console logging that bunch of subscriptions are setup, including that on the temperature_db database that we are interested in.

Kapacitor Client

The Kapacitor CLI (kapacitor) is the client application that you will be using to communicate to the Kapacitor Daemon. You can use the client to not just configure Alerts, enable/disable them but also check on their status and more.

One of the commands to check if there are any Tasks setup for Kapacitor is via the lists command. We can fire that as shown below:

$ kapacitor list tasks
Name       Type      Enabled   Executing Databases and Retention Policies

This shows that currently there are no tasks configured.

Create the High Temperature Alert Task via TICKscript

The next thing we are going to do is setup the Task to detect if the temperature is greater than 30 degrees Celsius from Station S1. The Task Script is written in a DSL called TICKscript.

The TICKscript for our High Temperature alert is shown below:

stream
   .from()
   .database('temperature_db')
   .measurement('temperature')
   .where(lambda:"Station" == 'S1')
   .alert()
   .message('{{index .Tags "Station" }} has high temperature : {{ index .Fields "value" }}')
   .warn(lambda:"value" >= 30)
   .log('/tmp/high_temp.log')

Notice that it is intuitive enough to read the script as given below:

  • We are looking at working in the stream mode, which means that Kapacitor is subscribing to realtime data feed from InfluxDB v/s batch mode, where Kapacior queries InfluxDB in batches.
  • We then specify which InfluxDB database via database(). This will monitor the stream of data going into our temperature_db database.
  • A filter is specified for the Tag Station. The value that we are interested in is the “S1” station.
  • For the above criteria, we would like to get alerted, only if the measurement value is greater than 30.
  • If the value is greater than 30, then we would like to log that data in a temporary file at this point (we will see the Slack integration in a while).
  • The message that we would like to capture (i.e. a custom message) is also specified. For e.g. “S1 has high temperature : 30.5”.

We save the above TICKscript in `temperature_alert.tick` file.

Configure the High Temperature Alert Task

The next step is to use the Kapacitor client to define this task and make it available to Kapacitor. We do that via the define command as shown below:

$ kapacitor define \
-name temp_alert \
-type stream \
-tick temperature_alert.tick \
-dbrp temperature_db.default

Notice the following parameters:

  • We name our Task as temp_alert.
  • We specify that we want to use this in stream mode.
  • We specify the TICKscript file : temperature_alert.tick.
  • The Database Retention Policy is selected as the default one (infinite duration and a replication factor set to the number of nodes in the cluster) from the temperature_db database.

We can now look at the tasks that the Kapacitor Service knows about as given below:

$ kapacitor list tasks

Name                 Type      Enabled   Executing  Databases and Retention Policies
temp_alert           stream    false     false      ["temperature_db"."default"]

You can see that the Enabled and Executing properties are set to false.

Dry Run : Temperature Alert

One of the challenges that you face while developing an Alerting system is to test it out before it goes into Production. A great feature of Kapacitor is to do a dry run of the Alert based on a snapshot/recording of data.

The steps are straightforward and at a high level we have to do the following:

  • Make sure that the Alert (temp_alert) is not enabled. We verified that in the previous section.
  • We ask Kapacitor to record a stream of data that is coming into InfluxDB for a given interval of time (say 20 seconds or 30 seconds). While recording this data, we ensure that some of the data coming in meets the condition to fire the Alert as we are expecting. In our case, if the temperature is above or equal to 30, then it should log the data.
  • Kapacitor records the data in the defined time interval above and gives us a recording id.
  • We then replay that data and tell Kapacitor to run it across the Alert (temp_alert) that we have defined.
  • We check if our TICKscript associated with the alert is working fine by checking our log file (/tmp/high_temp.log) for any entries.
  • If the Test runs fine, we will then enable the task.

Let’s get going on this. We already have our `temp_alert` not enabled i.e. the value for the `Enabled` attribute is false, as we saw in the `kapacitor list tasks` command.

The next step is to ask Kapacitor to start recording the data for our alert. We ask it to record the data in stream mode (the other options are batch and query). We specify the duration as 30 seconds and also specify the task name `(temp_alert)`.

kapacitor record stream -name temp_alert -duration 30s

This will make Kapacitor record the live stream of data for 30 seconds using the database and retention policy from the task specified. If your data is streaming in, give it a total of 30 seconds to record it. Alternately, you can also generate INSERT statements using Influx client.

Just ensure that the time interval from the first INSERT to the last INSERT is equal or more than the duration specified (30 seconds), where you send data via manual INSERT statements or even if it is streaming in.

The above command will complete and will output a recording id, an example of which is shown below:

`fbd79eaa-50c5-4591-bbb0-e76f354ef074`

You can check if the recordings are available in Kapacitor by using the following command:

kapacitor list recordings <recording-id>

A sample output is shown below:

ID                                      Type    Size      Created
fbd79eaa-50c5-4591-bbb0-e76f354ef074    stream  159 B     17 Feb 16 22:18 IST   

A size greater than zero indicates that the data was recorded. Now, all we need to do is replay the recorded data against the Alert that we have specified. The -fast parameter is provided to replay the data as fast as possible and not wait for the entire duration that the data was recorded against (in our case 30 seconds)

kapacitor replay -id $rid -name temp_alert -fast

where `$rid` is a variable that contains the value of the `Recording Id`.

The data that I had used during the recording phase contained values over 30 degrees centigrade for some of the records and that is exactly what I would expect the Alert to be fired upon and the records to be written to the `/tmp/high_temp.log` file.

On checking the file `/tmp/high_temp.log` for entries, we do notice the entries as shown below:

$ cat /tmp/high_temp.log
{"id":"temperature:nil","message":"S1 has high temperature : 31", … }
{"id":"temperature:nil","message":"S1 has high temperature : 32", … }
{"id":"temperature:nil","message":"S1 has high temperature : 31”, … }

Enable the Task

Now that we have validated that our Alert is working fine, we need to go live with it. This means we need to enable the task as shown below:

$ kapacitor enable temp_alert

You can now check up on the details of your task via the `show` command as shown below:

$ kapacitor show temp_alert

This will print out details on the task along with the TICKscript for the Task as given below:

Name: temp_alert
Error:
Type: stream
Enabled: true
Executing: true
Databases Retention Policies: ["temperature_db"."default"]
TICKscript:
stream
   .from()
   .database('temperature_db')
   .measurement('temperature')
   .where(lambda:"Station" == 'S1')
   .alert()
   .message('{{index .Tags "Station" }} has high temperature : {{ index .Fields "value" }}')
   .warn(lambda:"value" >= 30)
   .log('/tmp/high_temp.log')

DOT:
digraph temp_alert {
stream0 -> stream1 [label="0"];
stream1 -> alert2 [label="0"];
}

Note that the `Enabled` and `Executing` properties are now true.

High Temperature Alert in Action

If the temperature values are coming in, the Task will be executed and the record will be written to the log file. A specific record from the `/tmp/high_temp.log` file is shown below:

{"id":"temperature:nil","message":"S1 has high temperature : 30","time":"2016-01-22T06:37:58.83553813Z","level":"WARNING","data":{"series":[{"name":"temperature","tags":{"Station":"S1"},"columns":["time","value"],"values":[["2016-01-22T06:37:58.83553813Z",30]]}]}}

Notice that the message attribute has the message along with other tags, values and timestamp.

This confirms that our High Temperature Alert Task has been setup correctly and is working fine. The next thing to do is to set up the Slack Channel Notification.

Slack Incoming Hook Integration

The Slack API provides multiple mechanisms for external applications to integrate with it. One of them is the Incoming Webhooks Integration. Via this integration mechanism, external applications can post a message to a particular channel or an user inside a Slack Team.

Kapacitor supports posting messages to your Slack Team via this mechanism, so all we need to do is provide the details to the Kapacitor configuration, specify the slack notification in our TICKscript and we are all set.

Enable Slack Channel

The first step is to enable this integration inside of your Slack Team. To do that, we will assume that you are logged in to your Slack Team and you are the Administrator.

Go to Slack App Directory and click on Make a Custom Integration as shown below:

k2

This will bring up a list of Custom Integrations that you can build for your team and we will select the Incoming WebHooks as shown below:

k3

We want the message to be posted to the #general channel, so we select that channel and click on the Add Incoming WebHooks integration.

k4

This completes the WebHooks setup and it will lead you to the details page for the integration that you just setup. This will contain the Webhook URL that you need to note down. Kapacitor will just need to have this information, so that it can post the JSON Payload data to Slack, which in turn will deliver it to your #general channel.

k5

Configuring Slack Channel in Kapacitor Configuration file

The next thing that we need to do is go back to the `kapacitor.conf` file that our Kapacitor service was using.

In that file, you will find the `[slack]` configuration section and which we fill out as follows:

[slack]
 enabled = true
 url     = "https://hooks.slack.com/services/<rest of Webhook URL>"
 channel = "#general"
 global  = false

Notice that the Webhook URL that we got from the previous section is set for the url property. We also enable this channel, specify the `channel (#general)` to post to and set the global to false, since we would like to explicitly enabled the Slack integration in our TICKscript.

Save this file and restart the Kapacitor service again.

You should see the last few lines in the startup console as shown below:

[udp:temperature_db.default] 2016/01/22 06:46:53 I! Started listening on UDP: 127.0.0.1:35958
[influxdb] 2016/01/22 06:46:53 I! started UDP listener for temperature_db default
[task_master] 2016/01/22 06:46:53 I! Started task: temp_alert

Notice that the listener has been started for our temperature_db database and our task has also been started.

Add Slack Channel to TICKscript

We have not yet modified our TICKscript, which only logged the high temperature to a file. We will add the Slack channel now.

Open up the `temperature_alert.tick` file in an editor and add the additional line as highlighted below:

stream
   .from()
   .database('temperature_db')
   .measurement('temperature')
   .where(lambda:"Station" == 'S1')
   .alert()
   .message('{{index .Tags "Station" }} has high temperature : {{ index .Fields "value" }}')
   .warn(lambda:"value" >= 30)
   .slack()
   .log('/tmp/high_temp.log')

Save the `temperature_alert.tick` file.

Reload Task

We will now reload the Task again because we have changed the script. To do that, you have to define the task again (use the same name) as shown below. The `define` command will automatically reload an enabled task:

$ kapacitor define -name temp_alert -tick temperature_alert.tick

Slack Channel Notification

We are all set now to receive the Slack Notification. If the temperature data is streaming in and if the temperature value is greater than 30 degrees Celsius, you will see a notification in Slack. Shown below is a sample record in our general:

k6
This concludes the integration of Kapacitor into our IoT our sensor application.

What’s next?

  • In part seven, we will explore how to use Telegraf to collect system data about our temperature sensors. Follow us on Twitter @influxdb to catch the next blog in this series.
  • Looking to level up your InfluxDB knowledge? Check out our economically priced virtual and public trainings.