Prometheus + InfluxDB: Thoughts After the Austin Monitoring Meetup

Navigate to:

At The Austin Monitoring Meetup last night it was great to see how Prometheus and InfluxDB can be used together. I’ll highlight some of the details from my talk, but first I’d like to point to some work that Julius Volz, one of the creators of Prometheus, showed off last night that I think is quite exciting.

Julius gave a talk about Prometheus, introducing the project, the data model, its query language, and alerting system. He also talked about their philosophy on pull based monitoring methods and why Prometheus prefers them. I was nodding my head in agreement with many of his points. Any system can connect and pull metrics (my laptop, production monitoring, staging, new monitoring, etc.). It’s easy to pull up the /metrics endpoint to eyeball what is being exposed. Services don’t need to know about the monitoring system. I’ve actually been positive on pull based methods for a while and we’ll be introducing support for pull in addition to our traditional push based method in a few weeks. But I’m getting ahead of myself.

At the end of Julius’ talk he showed off a demo of Prometheus and InfluxDB—using InfluxDB as a remote storage backend for Prometheus. Prometheus has had the ability to push metrics to other storage backends for a while, but with this new work, it now adds the ability to read from remote storage backends. He hooked everything up, then showed a query that brought back data. Then he deleted the local Prometheus data store and started it back up. On querying again it still showed all the data because it was pulling it in real-time from InfluxDB.

How did the support for InfluxDB happen? I recently emailed Julius asking him about the remote storage system talk that he gave a few weeks ago at CNCFCon. After a weekend hack session, Julius added this ability. I’m excited about having the two projects work together and we have a few things we can do in the near term to improve the experience for Prometheus + InfluxDB users. Primarily, we’ll bring the remote gateway into InfluxDB so that remote storage can be configured and used without the need for an additional piece of software to run and manage. As we look towards improvements for InfluxDB 2.0, we’ll keep our eye out for features and integrations that will make Prometheus and InfluxDB work better together.

In my talk yesterday, I covered what the data models look like for Graphite, OpenTSDB, Prometheus, and InfluxDB. I used this material as an intro to the new data model I’d like InfluxDB to support. I also covered some high level thoughts on each project’s query language and the future evolution of InfluxQL. In short, I’m thinking we need to simplify our data model and expose a functional query language that is more powerful, extensible, and expressive. I’ll be posting a PR with lengthy documentation and justification for the proposed updates in an attempt to gather community feedback and improvements before we start initial implementation.

Here are the slides from my talk. In advance of more detailed information, here are some of the high level requirements:

  • Must be able to support InfluxDB 1.x data model
  • Must be able to support InfluxDB 1.x queries
  • Support Prometheus data model
  • New Functional Query Language
  • Rich Query Builder UI (users shouldn't need to learn the query language to get insight & visibility)
  • Query Completion CLI
  • Possible support for PromQL queries?

I think we’re in for some significant improvements over the next 12 months. I’m pushing for an advancement on the level of the 0.8 InfluxDB to 1.x, but this time without breaking changes. New InfluxDB will be a drop in replacement for 1.x. It may require a data migration, but it should support InfluxDB 1.x users with a clear path forward for more advanced functionality.This work will likely take the rest of the year and beyond before it’s ready for production, but we’ll be releasing early prototypes and moving 1.x forward with continued feature releases.