OpenCensus Metrics and InfluxDB

Navigate to:

OpenCensus is the latest in a series of projects which have emerged from Google as a result of their decades of experience running “planet-scale” systems; it is a collection of libraries, implemented in a number of languages, designed to help developers increase the observability of their systems. OpenCensus provides APIs for gathering metrics and trace data, adding tags, and then sending that data to a back-end for storage, as well as for the creation of zPages, HTTP endpoints which expose additional information about the application in question.

Instrumentation is a key part of any observability strategy, and the OpenCensus project provides a standards-based approach to adding those features to your codebase. As developers and cloud providers throw their support behind OpenCensus, we wanted to ensure a seamless interaction with the rest of the TICK Stack.

Fortunately, OpenCensus embraces a number of other open source technologies and standards which make that integration easy. The project provides a number of “exporters”, the component which sends data to the back-end, including one for metrics which uses the Prometheus exposition format and one for traces that uses Zipkin.

The Prometheus exposition format is something that we’ve supported for some time now in various parts of the TICK Stack, including a Prometheus input plugin for our collection agent, Telegraf, which allows it to scrape metrics from Prometheus-style endpoints.

As a result, we can easily collect data from any applications that have been instrumented with OpenCensus by making a few configuration changes to the Telegraf configuration file and enabling the Prometheus plugin.

Let’s dive in! This post assumes that you have Telegraf and InfluxDB already installed. If you don’t, you can also check out the InfluxData Sandbox to quickly get up and running with Docker and Docker Compose.

Instrumenting Go Code with OpenCensus

First, we need some code to instrument. Fortunately, we can make use of the example provided in the OpenCensus Quickstart guides. The libraries for the Go programming language are probably the farthest along, so we’ll use Go for this blog post.

The quickstart tutorial involves building a simple REPL which capitalizes any strings provided to it as input, while collecting metrics about the application. We’ll use this example pretty much line-for-line, with one exception—we’ll replace the the StackDriver exporter with the Prometheus one.

Working through the tutorial, the first thing we do is create a simple REPL, followed by adding instrumentation code, including metrics and tags, and then recording those metrics. We create a few views and then set up the exporter. You can scroll down to the last snippet of code and then click the “All” tab to grab the full application.

We’ll need to make a few changes to use the Prometheus exporter; fortunately, OpenCensus provides us some example code for that as part of the documentation on exporters. We’ll do a bit of code surgery; replacing a few import statements and updating the initialization and registration of the exporter. I added the existing code to a GitHub Gist, then updated it so you can view the diff if you’re interested.

Save the modified code and we can run the REPL:

go run repl.go

Enter a few lines into the REPL and then visit http://localhost:9091; you should see some metrics exposed in Prometheus format. As you enter new lines into the REPL, those metrics should change.

Collecting OpenCensus Metrics

As I mentioned earlier, Telegraf, the InfluxData collection agent, has built-in support for both scraping and exposing metrics in the Prometheus exposition format. All it takes to enable Prometheus metrics collection is to edit a few lines in the configuration file.

If you don’t already have a config file, you can generate one with the following command:

telegraf --sample-config --input-filter prometheus --aggregator-filter --output-filter influxdb

The --sample-config argument creates a new config file, while the various filter options filter the configuration sections which will be added to the config file. To enable the collection of Prometheus metrics, we’re using --input-filter prometheus, which results in an [[inputs.prometheus]] section being added to our config.

We need to edit that section and point it to the URL our local REPL is using as its metrics endpoint, http://localhost:9091/metrics. We can use all of the other defaults, as follows:

[[inputs.prometheus]]
  ## An array of urls to scrape metrics from.
  urls = ["http://localhost:9091/metrics"]

  ## An array of Kubernetes services to scrape metrics from.
  # kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]

  ## Use bearer token for authorization
  # bearer_token = /path/to/bearer/token

  ## Specify timeout duration for slower prometheus clients (default is 3s)
  # response_timeout = "3s"

  ## Optional TLS Config
  # tls_ca = /path/to/cafile
  # tls_cert = /path/to/certfile
  # tls_key = /path/to/keyfile
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

Because we’re using all of the defaults at this point, Telegraf is writing to a telegraf database, collecting data every 10 seconds and flushing it to the outputs at the same interval. We’re also running InfluxDB on the same machine, so the default InfluxDB output configuration should be sufficient.

Start Telegraf using the config file:

telegraf --config telegraf.conf

You should see metrics showing up in the telegraf database after the first collection interval has elapsed! Let’s verify this using the influx command-line tool to execute a query.

Launch the CLI and use the telegraf database:

$ influx
Connected to http://localhost:8086 version 1.6.1
InfluxDB shell version: v1.6.1
> USE telegraf
Using database telegraf
>

Then, run a query to get the last 5m of data for the demo_demo_latency measurement:

> SELECT * FROM demo_demo_latency WHERE time > now()-5m

You should see a number of points, depending on how long your application has been running.

Next Steps

As you can see, it’s extremely easy to collect metrics from an OpenCensus-instrumented application using Telegraf and the OpenCensus Prometheus exporter. If you already have an application that’s been instrumented with OpenCensus, then you might want to consider installing Chronograf and adding a few dashboards, or setting up Kapacitor and configuring alerts.