Monitoring TLS Certificates with Telegraf

Navigate to:

We’ve all been there. You’re sitting eating your lunch in the office canteen and you notice a flurry of people walking briskly and asking each other to check the website on their phone. Is it just one phone? Oh, it’s your phone too. Maybe it’s the WiFi … they check on 4G …

The faces slowly turn in your direction, eyes catching awkwardly. You feel your phone vibrate in your pocket … not just once. Production can’t be down, you think. My pager hasn’t gone off … everything must be fine, they’re confused; right?

Oh dear. The x509 / TLS certificate expired and nobody in the world can browse our high-profile, 24x7, worldwide, super amazing website.

While it’s common for operators and developers to monitor their systems, using the metrics we treasure so dearly: RED/USE/4 Golden Signals, something so simple is often overlooked - the x509 certificates with which we deliver our website, used to authenticate microservices, or to authenticate against the Kubernetes API.

Fortunately, we’ve got you covered! Telegraf has had an x509_cert plugin for many years now, and it couldn’t be easier to setup.

Configuring the plugin

The x509_cert input plugin supports local and remote x509 endpoints. So whether you’re running Telegraf as a daemonset on your Kubernetes cluster, monitoring your local cert directory, or running a single instance to monitor your certificates from a users perspective; we’ve got you covered.

[[inputs.x509_cert]]
    sources= ["https://www.example.org:443", "/etc/tls/certs/www.example.org"]

Available metrics

Now that we’ve got Telegraf collecting and sending our x509 metrics to InfluxDB, we can begin to build a query to alert on its expiration. Fortunately, this is almost as simple as configuring the plugin.

SELECT (expiry / 60 / 60 / 24) as "expiry" FROM "telegraf"."autogen"."x509_cert"

This will return the number of days until each certificate expires.

Telegraf provides the following tags to filter or add to your alerts:

  • common_name
  • country
  • locality
  • organization
  • organizational_unit
  • province
  • source

If you’ve got Telegraf configured to add the hostname to each measurement, that will also be available. Be sure to only use this when running Telegraf as a daemonset or on bare metal.

Telegraf makes it incredibly simple to monitor these certificates that nobody should ever have to get caught off guard again.