tick-charts Update

Navigate to:

This is an update to an earlier blog post I did on tick-charts, the project to make the InfluxData stack as easy to run on Kubernetes as possible. While developing the project I’ve learned that some of the original implementation decisions were not the right way to go! Also, Chronograf has matured significantly since that time and has required additions to the chart to support a few different flavors of OAuth: Heroku, Github, and Google.

Telegraf Changes

The primary change from the first version is the re-architecture of the telegraf chart. Initially I thought that a monolithic chart for telegraf would be the best way to go. That chart put host-level monitoring, as well as polling, in one values.yaml file. This becomes complicated because telegraf is primarily configuration driven, and holding both configurations in one file quickly becomes too complex. Another downside of that chart was that spinning up a single telegraf instance required deleting a bunch of configurations for the daemonset.

To reduce this complexity I’ve taken a two-prong approach. First I split up the telegraf chart into two, providing a much cleaner interface for users. Next I reduced the amount of configuration required for the daemonset (telegraf-ds). The defaults provide the basics for host-level monitoring in Kubernetes: Kubelet and Docker polling, cpu, mem, disk, system load, and network statistics. All that is required of the user is to set config.outputs.influxdb.url. If you don’t want to host your InfluxDB, you can easily spin up an InfluxDB Cloud instance to hold the data.

The chart for spinning up single telegraf instances is now called telegraf-s. Currently, it is implemented in much the same way as the old telegraf chart: using some custom golang templates to generate the configuration. This is difficult for plugins that require a substantial amount of configuration such as snmp or jolokia and is error prone. In order to eliminate that complexity and the difficulty of maintaining custom code, I’ve added a toToml template function to Helm. Once the 2.3 release of Helm is available, the telegraf-s chart will be modified to use it. This will make creating telegraf instances in your cluster extremely easy. Using a tool like remarshal to convert the tomloutput by telegraf into yaml, the workflow for generating a new telegraf instance to monitor a piece of your cluster will look like this:

# On mac at least...
$ telegraf -sample-config -input-filter nginx:cloudwatch -output-filter 
kafka_producer | toml2yaml | pbcopy

The resulting blob of yaml can be added directly to your values.yaml file under the config section and edited there.

Next Steps

I plan to continue improving the production readiness, examples, and user experience for deploying the TICK stack. The following items are on my # TODO: for this project:

  • Production Readiness
    • Make InfluxDB deploy with basic authentication enabled
    • Add backup/restore job examples to back up InfluxDB to s3 or other object store
    • Create job to dynamically reload configuration for telegraf on upgrade
  • User Experience
    • Create a top-level chart so that the whole deployment can be managed from one chart
    • Reduce the time from zero to dashboards to as short as possible
    • Address any pain points that begin to show with increasing usage
  • Examples
    • Monitoring Prometheus endpoints with Telegraf
    • Using queues in Kubernetes with Telegraf
    • Monitoring {{ .telegraf_plugin.name }} (suggestions welcome!)

I would also love some input as to what features you would like to see in this integration. Please post your questions/concerns/suggestions or other comments on this post on our Community site. I can’t wait to hear from you!