Category Archives: Kapacitor

Kubernetes monitoring and autoscaling with Telegraf and Kapacitor


With the 1.1 release of Telegraf and Kapacitor, InfluxData is improving the ease of use, depth of metrics and level of control we provide in maintaining and monitoring a Kubernetes cluster. InfluxDB has been a part of Kubernetes’ monitoring since v0.4 when the first version of Heapster was released and InfluxDB remains the default data sink.

Continue reading Kubernetes monitoring and autoscaling with Telegraf and Kapacitor

Announcing Kapacitor 1.0 – A Data Processing Engine for InfluxDB


Kapacitor 1.0 GA is here. Kapacitor is the brains of the TICK stack. You can leverage Kapacitor to process your data for various business needs and use it to find changes or anomalies within your time series data. We have come a long way since the 0.13.1 release of Kapacitor back in mid May 2016. Since then, we’ve added 33 new features and 42 bug fixes. We had many PRs from the community, from simple bug fixes to major features. Thanks to everyone for helping improve Kapacitor!

What’s new?

Of the new features in 1.0 there are eight we want to highlight, three of which are community contributions…

  • HTTP based subscriptions: Goodbye to the complexity of managing a unique UDP port per database, HTTP subscription are here and provide a simpler, more reliable transport from InfluxDB to Kapacitor.
  • Template Tasks: You can now define templates so that multiple tasks that share common behavior can easily be managed together.
  • Holt-Winters Forecasting: Start using Kapacitor for more complex anomaly detection tasks, Holt-Winters is the first of many powerful algorithms to be added to the TICK stack.
  • Alert Reset Expressions: Noise from alerts is a plague. Thanks to @minhdanh, you can now define reset expressions for your alert levels to reduce alert noise.
  • Group By Fields: Convert any field into a tag so that you can use it to group your data streams.
  • Telegram Alerting: You can now send messages to Telegram thanks to @burdandrei.
  • Better and Faster Lambda Expressions: @yosiat greatly improved the performance of lambda expressions and added support for conditional logic.
  • Live Replays: Replay data directly from InfluxDB without going through any intermediate steps.

Continue reading Announcing Kapacitor 1.0 – A Data Processing Engine for InfluxDB

Announcing InfluxDB, Telegraf, Kapacitor and Enterprise 1.0 RC1


InfluxData team is excited to announce the 1.0 release candidate (RC1) of the Telegraf, InfluxDB, Chronograf, Kapacitor projects and InfluxEnterprise, our clustering and monitoring product. These projects have been years in the making, so are pleased to have finalized the API and all the features for the 1.0 release. From now until the 1.0 final release we’ll only merge bug fixes into the 1.0 branch.

Robust & Stable APIs

We’ve already done extensive testing on both the single server and clustered versions of InfluxDB, so these releases should be considered very stable. We’ll be rolling out the 1.0 RC to all of our customers in InfluxCloud over the next week or two.

Purpose Built for Time-Series

InfluxDB 1.0 will be shipping with our purpose built storage engine for time series data, the Time Structured Merge tree. It gives incredible write performance while delivering compression better than generalized solutions. Depending on the shape of your data, each individual value, timestamp pair can take as little as 2 bytes, including all the tag and measurement metadata. Continuous queries and retention policies give users the ability to have the database manage the data lifecycle automatically by aggregating data for longer term trends while dropping older high precision data.

Continue reading Announcing InfluxDB, Telegraf, Kapacitor and Enterprise 1.0 RC1

Announcing InfluxDB 1.0 Beta – A Big Step Forward for Time-Series


The team at InfluxData is excited to announce the immediate availability of InfluxDB 1.0 Beta plus the rest of the components in the TICK stack: Telegraf, Chronograf, and Kapacitor. While there are many new features like exponential smoothing via Holt Winters queries, templates for Kapacitor TICKscripts, Telegraf Cloudwatch integration, and dozens of bug fixes, this release marks a significant point in the development of these projects.

We’ve had customers and community members running the TICK-stack in production at significant scale for months and we are therefore comfortable that the quality of the codebase is worthy of the 1.0 moniker. Second, we’re ready to lock down the API and make a commitment to zero breaking changes for a significant length of time. This is especially important for organizations building products and services on top of the InfluxData stack whose products may have longer development cycles or require a higher degree of stability from the code base to ensure continuity for their customers and users.

Getting to 1.0 GA

This release is the first Beta of the upcoming 1.0 GA release. We still have some known bugs to fix, but from here until 1.0 we’ll be focused on testing, benchmarking and bug fixes. What about new features? They’ll come in the point releases after 1.0. For community members, this Beta is what you should be testing against. For some users, the Beta may even be suitable for production use. Many fixes have gone into all the projects since the 0.13 release nearly 4 weeks ago. Continue reading Announcing InfluxDB 1.0 Beta – A Big Step Forward for Time-Series

Announcing InfluxDB v0.13 Plus Clustering Early Access


The team at InfluxData is excited to announce the immediate availability of v0.13 of the TICK stack!

What’s the TICK stack? It’s InfluxData’s end-to-end platform for collecting, storing, visualizing and alerting on time-series data at scale. Learn more.

We are also pleased to announce early access to InfluxEnterprise. If you have been waiting for InfluxDB clustering to run on your infrastructure with the ability to rebalance nodes, plus a slick UI to deploy, manage and monitor the deployment, InfluxEnterprise is for you.

enterprise_gif

Like what you see? Contact us to request a demo, get early access plus pricing from one of our solutions architects.

InfluxDB v0.13

v0.13 is a significant InfluxDB release delivering robust features and stability to the single server experience. This is also our final major milestone before we launch 1.0. New in this release:

  • Upgrade to Go v1.6.2 delivering performance gains
  • Support for DELETE FROM
  • Time Zone Support – offset argument in the GROUP BY time(…) call

There’s a total of 27 new features and 32 bug fixes! Detailed release notes can be found here and downloads here.

Telegraf v0.13

New in this release:

  • New Input Plugin: “filestat” to gather metrics about file existence, size, etc.
  • HAProxy socket support
  • Support for Docker container IDs to track per container metrics

There’s a total of 17 new features and 21 bug fixes! Detailed release notes can be found here and downloads here.

Chronograf v0.13

New in this release:

  • Design update and performance improvements to the visualization index pages
  • “Visualization cards” have been replaced with more performant list items

chrono
Downloads are here.

Kapacitor v0.13

InfluxDB’s native data processing and alerting engine now supports BYOA (bring your own algorithm) analytics. UDFs written in languages of your choice running in Docker containers can be launched independent of Kapacitor by exposing a Unix socket. On startup Kapacitor will connect to the socket and begin communication, applying the UDF logic to the time series data being processed.

New in this release:

  • UDFs can be connected over a Unix socket. This enables UDFs from across Docker containers to be applied to the Influx data pipeline.
  • CLI features to ID, list and delete tasks, recordings and replays
  • API lock down to a stable 1.0 version

There’s a total of 18 new features and 8 bug fixes! Detailed release notes can be found here and and downloads here.

But wait, there’s more…

FREE Technical Papers

Today we also launched a new “Technical Papers” landing page on influxdata.com where developers and architects can download long reads that dive deep into a variety of InfluxDB related topics. Right now we’ve got three papers posted:

Make sure to check back often as we’ll be uploading new papers every other week!

FREE Virtual Training Summer Schedule is Live!

We’ve just added eight sessions to the virtual training schedule. In addition, we’ll be debuting three new topics including:

  • Benchmarking InfluxDB and Elasticsearch for Time-Series Data Management
  • Migrating from InfluxDB 0.8 and Up to 0.13
  • Intro to Kapacitor for Alerting and Anomaly Detection

What’s Next?

Combining Kapacitor and Continuous Queries


Kapacitor can be used to do the same work as Continuous Queries in InfluxDB. Today we are going to explore reasons to use one over the other and the basics of using Kapacitor for CQ type workloads.

What’s Kapacitor? It’s the “K” in the TICK stack. Kapacitor is InfluxDB’s native data processing engine. It can process both stream and batch data from InfluxDB. Kapacitor lets you plug in your own custom logic or user defined functions to process alerts with dynamic thresholds, match metrics for patterns or compute statistical anomalies.

An Example

First, lets take a simple CQ and rewrite it as a Kapacitor TICKscript.

Here is a CQ that computes the mean of the cpu.usage_idle every 5m and stores it in the new measurement mean_cpu_idle.

CREATE CONTINUOUS QUERY cpu_idle_mean ON telegraf BEGIN SELECT mean("usage_idle") as usage_idle INTO mean_cpu_idle FROM cpu GROUP BY time(5m),* END

 

To do the same with Kapacitor here is a streaming TICKscript.

stream
    |from()
        .database('telegraf')
        .measurement('cpu')
        .groupBy(*)
    |window()
        .period(5m)
        .every(5m)
        .align()
    |mean('usage_idle')
        .as('usage_idle')
    |influxDBOut()
        .database('telegratelegraff')
        .retentionPolicy('default')
        .measurement('mean_cpu_idle')
        .precision('s')

The same thing can also be done as a batch task in Kapacitor.

batch
    |query('SELECT mean(usage_idle) as usage_idle FROM "telegraf"."default".cpu')
        .period(5m)
        .every(5m)
        .groupBy(*)
    |influxDBOut()
        .database('telegraf')
        .retentionPolicy('default')
        .measurement('mean_cpu_idle')
        .precision('s')

 

All three of these methods will produce the same results.

Continue reading Combining Kapacitor and Continuous Queries

Announcing InfluxDB, Telegraf and Kapacitor 0.12 – Kill Query and New Functions


We are excited to announce that InfluxDB, Telegraf and Kapacitor 0.12.0 GA have been released and ready for download.

InfluxDB improvements

This is the first version of the stand-alone server of InfluxDB. It’s important to note that to upgrade you’ll have to do a small migration of the metadata. See the upgrade details here. For users of the open source products that are looking for an HA option, we’ve also completed the first version of the InfluxDB Relay.

One of the more important features of this release are query management and limitations. You can now see what queries are running by issuing the following query:

`SHOW QUERIES`

You can then kill one of those queries by running:

`KILL QUERY`

This is one of the long requested features and will now give administrators the ability to kill a query that has been running for too long. We’ve also added some configuration options that lets administrators limit how many points, series, or group by intervals a query can hit.

If a user issues a query that passes over those limits it will be terminated and an error returned back. You can see the settings in the example config file.

This release also added support for two functions that have long been on the request list: difference and moving average!

This release has 22 features and 13 bug fixes, see the CHANGELOG for full details.

Telegraf Enhancements

What’s Telegraf? It’s InfluxDB’s native data collector that leverages a plugin architecture with over 60 inputs and outputs supported already!

Lots of great features shipping with 0.12, including:

  • Parse environment variables in the config file: you can now specify env variables in your config file (such as `$USER` or `$MYSQL_HOST`). This can be used anywhere in the config file as strings, booleans, or integers for configuration items or tags.
  • JSON serializer: Telegraf now has the ability to output data in JSON.
    Parse singular values: Telegraf can parse individual values from scripts or executables. This means that you can specify something as simple as `cat /proc/sys/kernel/random/entropy_avail` as an `exec` command, and get parsed data!
  • Nagios parser: allows you to run nagios scripts using the exec plugin.
  • IPMI hardware sensors: If you have ipmitool installed on your system, you can now utilize the `ipmi_sensor` plugin to grab data from each sensor.
  • Couchbase input plugin: gather stats from your Couchbase servers.

Kapacitor Enhancements

What’s Kapacitor? It’s InfluxDB’s native alerting, pre/post data processing engine that supports UDFs (via TICKscripts) to enable things like anomaly detection in real-time.

Kapacitor v0.12.0 brings significant improvements to the TICKscript syntax. Now with two new operators to distinguish different kinds of methods you will never be left guessing what structure the task pipeline has or when a UDF is being used. In addition to the new syntax a utility for formatting TICKscripts to a common standard has been included. Now reading and writing TICKscripts is a cleaner more precise task, so you can focus on the problem at hand. Along with these changes several bugs have been squashed, see the CHANGELOG release notes for more details.

What’s next

  • Download 0.12 of the TICK stack!
  • Need help migrating from 0.8x or 0.9x to 0.11? We are here to help! Drop us a line at contact@influxdata.com to get your migration project started.
  • Looking to level up your InfluxDB knowledge? Check out our economically priced virtual and public trainings.

InfluxDB, Telegraf and Kapacitor 0.11 GA Now Available!


We’re excited to announce that InfluxDB, Telegraf and Kapacitor 0.11 GA are now available for immediate download.

New in InfluxDB 0.11 GA – Query Performance Gains

InfluxDB 0.11 has huge improvements to the query engine to improve performance, stability, and solve some of the out of memory issues that some users were seeing on larger queries. The new query engine is anywhere from 1.4x to 3.8x faster for many queries. By popular request, we’ve also started creating ARM builds! This release has many other improvements including 22 Features and 42 bug fixes. (A list of breaking changes are in the release notes section here.)

The new query engine is important because it lays the foundation for many exciting new query features. New functions and ways to transform and combine series will be in the upcoming releases as a result of this effort. We’ll also give administrators the ability to see and even kill long running queries along with other controls limiting resource utilization.

Now let’s take a look at some of the performance gains from this query engine work. The first has a 10,000 unique series each with 1,000 points. We then ran queries to count all of the points in these series. With this query we saw a 29% decrease in the query response time!

Continue reading InfluxDB, Telegraf and Kapacitor 0.11 GA Now Available!

Part 7: How-to Create an IoT Project with the TICK Stack on the Google Cloud Platform


Part 7 : Collecting System Sensor Data with Telegraf

The last part of this tutorial looks at Telegraf, the “T” in the TICK Stack. Telegraf is an agent that is used to collect metrics from various input channels and write them to output channels. It supports over 60 plugins which can function as the input source or output target of data.

The agent is completely plugin driven and while it supports multiple plugins off the bat, you can write your own plugins too.

Our tutorial so far has looked at collecting temperature data from multiple weather stations and persisting that in InfluxDB. In addition to that, we also looked at setting up Chronograf to view the temperature data via a dashboard and set up alerts via Kapacitor, that pushed notifications to Slack in case the temperature went over a certain limit.

At this point, the data is being collected via Raspberry Pi stations that are having the temperature data and the flow is pretty much in place. The area that we would look at utilizing Telegraf would be to monitor the CPU, Memory and other system parameters of the InfluxDB server.

  • Telegraf comes along with a input plugin named `system`. This plugin captures various metrics about the system that it is running on like memory usage, CPU, disk usage and more. We shall use this plugin to capture the cpu and memory metrics on the InfluxDB server.
  • The input metrics captures will need to be sent to an output system. In our case, we will push this data into InfluxDB itself. This will help us capture these metrics into an InfluxDB database on which we could potentially then build out dashboard and alerts too via Chronograf and Kapacitor. Sounds neat. The output plugin therefore will be InfluxDB.

The diagram below depicts what we are going to do:

tele1
Installing Telegraf

We are going to install Telegraf on the InfluxDB Server instance. Currently we just have one instance running in the Google Cloud and we will be setting it up on that.

As mentioned earlier, the VM runs Debian Linux and we can follow the steps for installing Telegraf as given at the official documentation site. Follow the instructions as given for installing the latest distribution of Telegraf as given below:

wget http://get.influxdb.org/telegraf/telegraf_0.10.2-1_amd64.deb

sudo dpkg -i telegraf_0.10.2-1_amd64.deb

Configuring Telegraf

We need to provide a configuration file to Telegraf. This configuration file will contain not just Agent configuration parameters but also the input and output plugins that you wish to configure.

There are a ton of plugins for both input and output that Telegraf supports and it does give a command to generate a telegraf.conf (Configuration file) that creates all the input and output plugin configuration sections. That is a useful thing to keep with you but not what we want for our need.

We will be using the following generic command to generate a Telegraf configuration file for us:

telegraf -sample-config -input-filter <pluginname>[:<pluginname>] -output-filter <outputname>[:<outputname>] > telegraf.conf

In our case, we have the following:

We generate a `telegraf.conf` as shown below:

telegraf -sample-config -input-filter cpu:mem -output-filter influxdb > telegraf.conf

Let us look at the key sections in the generated `telegraf.conf` file:

  • [agent] : This is the section for the Telegraf agent itself. Ideally we do not want to tweak too much here. Do note that you could change the frequency (time interval) at which the data collection is done for all inputs via the `interval` property.
  • The next section is one or more `outputs`. In our case, it is just `influxdb output` i.e. `[[outputs.influxdb]]`. Two properties are key here, urls and database. The urls property is a list of influxdb instances. In our case there is just one and we are running Telegraf on the same machine as the InfluxDB instance, so the endpoint is pointing to the InfluxDB API Endpoint at `http://localhost:8086`. Similarly, database property is the database in which the input metrics will be collected. By default it is set to `telegraf` but you can change it to another one. I will go with the default one.
  • The next sections are for the inputs. You can see that it has created the `[[inputs.cpu]]` and `[[inputs.mem]]` inputs. Check out the documentation for both cpu and mem inputs.

Starting Telegraf and collecting metrics

Let us start the Telegraf Agent now via the following command:

telegraf -config telegraf.conf

We could have pushed the generated `telegraf.conf` into `/etc/telegraf` folder and started it as a service, but for the purpose of this tutorial explanation here, this is fine.

On successful startup, it displays an output as shown below:

$ telegraf -config telegraf.conf
2016/02/15 04:36:39 Starting Telegraf (version 0.10.2)
2016/02/15 04:36:39 Loaded outputs: influxdb
2016/02/15 04:36:39 Loaded inputs: cpu mem
2016/02/15 04:36:39 Tags enabled: host=instance-1
2016/02/15 04:36:39 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"instance-1", Flush Interval:10s

Recollect that one of the properties for the Telegraf Agent was the interval property which was set to 10 seconds. This was the interval at which it will poll all the inputs for data.

Here is the output from several data collection intervals:

2016/02/15 04:36:40 Gathered metrics, (10s interval), from 2 inputs in 531.909µs
2016/02/15 04:36:50 Gathered metrics, (10s interval), from 2 inputs in 447.937µs
2016/02/15 04:36:50 Wrote 4 metrics to output influxdb in 3.39839ms
2016/02/15 04:37:00 Gathered metrics, (10s interval), from 2 inputs in 482.658µs
2016/02/15 04:37:00 Wrote 3 metrics to output influxdb in 4.324979ms
2016/02/15 04:37:10 Gathered metrics, (10s interval), from 2 inputs in 775.612µs
2016/02/15 04:37:10 Wrote 3 metrics to output influxdb in 7.472159ms
2016/02/15 04:37:20 Gathered metrics, (10s interval), from 2 inputs in 438.388µs
2016/02/15 04:37:20 Wrote 3 metrics to output influxdb in 3.219223ms
2016/02/15 04:37:30 Gathered metrics, (10s interval), from 2 inputs in 419.607µs
2016/02/15 04:37:30 Wrote 3 metrics to output influxdb in 3.159644ms
2016/02/15 04:37:40 Gathered metrics, (10s interval), from 2 inputs in 426.761µs
2016/02/15 04:37:40 Wrote 3 metrics to output influxdb in 3.894155ms
2016/02/15 04:37:50 Gathered metrics, (10s interval), from 2 inputs in 449.508µs
2016/02/15 04:37:50 Wrote 3 metrics to output influxdb in 3.192695ms
2016/02/15 04:38:00 Gathered metrics, (10s interval), from 2 inputs in 498.035µs
2016/02/15 04:38:00 Wrote 3 metrics to output influxdb in 3.831951ms
2016/02/15 04:38:10 Gathered metrics, (10s interval), from 2 inputs in 448.709µs
2016/02/15 04:38:10 Wrote 3 metrics to output influxdb in 3.246991ms
2016/02/15 04:37:30 Gathered metrics, (10s interval), from 2 inputs in 419.607µs
2016/02/15 04:38:20 Gathered metrics, (10s interval), from 2 inputs in 514.15µs
2016/02/15 04:38:20 Wrote 3 metrics to output influxdb in 3.838368ms
2016/02/15 04:38:30 Gathered metrics, (10s interval), from 2 inputs in 520.263µs
2016/02/15 04:38:30 Wrote 3 metrics to output influxdb in 3.76034ms
2016/02/15 04:38:40 Gathered metrics, (10s interval), from 2 inputs in 543.151µs
2016/02/15 04:38:40 Wrote 3 metrics to output influxdb in 3.917381ms
2016/02/15 04:38:50 Gathered metrics, (10s interval), from 2 inputs in 487.683µs
2016/02/15 04:38:50 Wrote 3 metrics to output influxdb in 3.787101ms
2016/02/15 04:39:00 Gathered metrics, (10s interval), from 2 inputs in 617.025µs
2016/02/15 04:39:00 Wrote 3 metrics to output influxdb in 4.364542ms
2016/02/15 04:39:10 Gathered metrics, (10s interval), from 2 inputs in 517.546µs
2016/02/15 04:39:10 Wrote 3 metrics to output influxdb in 4.595062ms
2016/02/15 04:39:20 Gathered metrics, (10s interval), from 2 inputs in 542.686µs
2016/02/15 04:39:20 Wrote 3 metrics to output influxdb in 3.680957ms
2016/02/15 04:39:30 Gathered metrics, (10s interval), from 2 inputs in 526.083µs
2016/02/15 04:39:30 Wrote 3 metrics to output influxdb in 4.32718ms
2016/02/15 04:39:40 Gathered metrics, (10s interval), from 2 inputs in 504.632µs
2016/02/15 04:39:40 Wrote 3 metrics to output influxdb in 3.676524ms
2016/02/15 04:39:50 Gathered metrics, (10s interval), from 2 inputs in 640.896µs
2016/02/15 04:39:50 Wrote 3 metrics to output influxdb in 3.773236ms
2016/02/15 04:40:00 Gathered metrics, (10s interval), from 2 inputs in 491.794µs
2016/02/15 04:40:00 Wrote 3 metrics to output influxdb in 3.608919ms
2016/02/15 04:40:10 Gathered metrics, (10s interval), from 2 inputs in 571.12µs
2016/02/15 04:40:10 Wrote 3 metrics to output influxdb in 3.739155ms
2016/02/15 04:40:20 Gathered metrics, (10s interval), from 2 inputs in 505.122µs
2016/02/15 04:40:20 Wrote 3 metrics to output influxdb in 4.151489ms

Since we have the InfluxDB Server running along with the endpoints for Admin interface, we can investigate the `telegraf` database from the Admin interface itself (you could have done that via the InfluxDB shell too!)

tele2
Here are some of the `cpu` measurement records:

tele3
Here are some of the `mem` measurement records:

tele4
As a next step, you could hook in visualization (Chronograf) or alerts (Kapacitor) into this Telegraf database.

Conclusion

This concludes the 7-part tutorial on using the TICK-stack from InfluxDB. The TICK-stack provides a best in class set of components to build modern and extensible solutions on a time-series database. We hope this tutorial gave you a glimpse into its potential and gets you started to create winning applications.

What’s next?

  • Get started with InfluxDB here.
  • Looking to level up your InfluxDB knowledge? Check out our economically priced virtual and public trainings.