Kubernetes monitoring and autoscaling with Telegraf and Kapacitor

Nathan Huago — November 8, 2016

With the 1.1 release of Telegraf and Kapacitor, InfluxData is improving the ease of use, depth of metrics and level of control we provide in maintaining and monitoring a Kubernetes cluster. InfluxDB has been a part of Kubernetes’ monitoring since v0.4 when the first version of Heapster was released and InfluxDB remains the default data sink.

Over the past two years we have worked with our users to get the most out of their Kubernetes monitoring setups. Unfortunately, Heapster’s output was never tuned particularly well to make the most efficient use of InfluxDB’s storage backend, forcing some of our users to experience more resource utilization from the data store than they would expect. Meanwhile much of the new development in Kubernetes metrics and monitoring does not natively integrate with Heapster.
So, we have reached out to the community to rethink how Kubernetes data is being collected and stored. In the process, we found a couple of common themes: first, no one seems committed to the long term future of Heapster; second, people are split on the push vs pull model of gathering metrics and the InfluxData philosophy is to provide support for both; third, since there are no current standards for the central distribution of Kubernetes metrics, users were forced to go to multiple sources and more often than not, had an incomplete view of their data. Finally, almost everyone wanted the ability to trigger the horizontal pod autoscaler using metrics other than those that are currently provided.

Telegraf

To ensure that we were gathering the richest set of metrics possible, we spent time with a number of current Kubernetes community members including the monitoring team at Deis. Their pre-configured monitoring solution for the Deis Workflow requires additional Kubernetes metrics that are provided by Heapster. That and the fact that they were already users of Telegraf and InfluxDB inspired them to create an input plugin. They were kind enough to contribute that input plugin which gathers metrics from the Kubernetes /stats/summary API endpoint. This plugin, along with the Prometheus plugin pointed at kube state metrics, provides Telegraf with a wide range of Kubernetes metrics. Running Telegraf as part of the daemonset lets users receive push-style metrics delivery while getting many of the same benefits provided by pull-style metrics. The data collection from Telegraf combined with Heapster provides InfluxDB users with access to key metrics for Kubernetes internals.

Kapacitor

The most common request we have heard from the community was the desire to easily scale their Kubernetes cluster with more flexibility than is provided by the horizontal pod autoscaler. Kapacitor 1.1 provides an easy path to resize your replica sets, deployments and replication controllers using any data stored in InfluxDB. We added Kubernetes replica set sizing logic into our flexible data processing framework, thus allowing Kapacitor to consider additional metrics when scaling pods up or down. For example, you can scale application servers based on page requests, or processing pods based on queue depth.

We have created an example of using Kapacitor to resize an application replica set . Please try it out and let us know what you think – we are always interested in the community’s feedback or code contributions.

What’s next

  • Downloads for the TICK-stack are live on our “downloads” page
  • Deploy on the Cloud: Get started with a FREE trial of InfluxCloud featuring fully-managed clusters, Kapacitor and Grafana.
  • Deploy on Your Servers: Want to run InfluxDB clusters on your servers? Try a FREE 14-day trial of InfluxEnterprise featuring an intuitive UI for deploying, monitoring and rebalancing clusters, plus managing backups and restores. 
  • Tell Your Story: Over 100 companies have shared their story on how InfluxDB is helping them succeed. Submit your testimonial and get a limited edition hoodie as a thank you.