Prometheus Monitoring

A Guide to Prometheus Monitoring - Definition, Use Cases, Applications, and Resources

What is Prometheus?

Prometheus is an open source systems monitoring and alerting toolkit originally built at SoundCloud by ex-Googlers who wanted to monitor metrics on their servers and applications. Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes. Prometheus is an open source offering that is provided independently from any company and is very popular as the monitoring solution for Kubernetes metrics. Prometheus, like InfluxDB, is written in Go.

How does Prometheus Work as a Monitoring Solution?

The Prometheus website provides a great overview for the Prometheus Monitoring solution and the underlying time series infrastructure. Basically to monitor your services using Prometheus, they need to expose a Prometheus endpoint. This endpoint is an HTTP interface that exposes a list of metrics and the current value of the metrics. The Prometheus server then polls the metrics interface on the services and stores the data. This architecture is referred to as polling-based monitoring, or pull-based monitoring.

For Kubernetes environments, the service discovery is well-integrated and Prometheus polls the metrics endpoints and gathers the metrics into the Prometheus Server for monitoring and alerting. A benefit of the pull approach is that it does not require you to install an agent to collect the metrics, although you still need to deploy “exporters” to expose the metrics from the system(s) you are collecting metrics from.

Prometheus-Monitoring-Diagram

Push vs. Pull Based Metric Collection and Monitoring

In the pull-based method, the monitoring agent polls the targets being monitored periodically and alerts based on that data. In the push method, telemetry and metrics are pushed to the monitoring agent (or more frequently a time series database), and monitoring is done either through the agent or other processes querying the database.

When instrumenting your own application code, you need to choose between the push and pull collection methods. Either you send metrics out to another service via a client library, or you make them available to others through some network addressable target (like an HTTP API, for example).

Prometheus has become the standard language for Kubernetes pull-based metrics. By formalizing the pull method, Prometheus provides a standard language for all kinds of services and applications to expose targets using a standard format to pull metrics data from.

The primary disadvantage of pull-based methods is that they don’t work well for event-driven time series (like individual requests to an API, or events in your infrastructure). Another disadvantage is that all metric endpoints have to be reachable from the server, implying a more elaborate secure network configuration. This can also become an issue for large-scale deployments that require clustering for high availability.

For Kubernetes only, monitoring pull-based metrics collection might be just fine, but for distributed environments, especially in IoT architectures, push-based monitoring is preferable. In most environments, there is usually the need to monitor and alert on both metrics (regular time intervals) and events (irregular time intervals), so it’s preferable to support both push and pull. This is currently a limitation of Prometheus, but this is where InfluxData’s Telegraf and Kapacitor can enhance the Prometheus environment.

Augmenting Prometheus to Support Monitoring Using Push and Pull

Kapacitor can read all the metrics generated following the Prometheus standard. This means that any service discovery target that works with Prometheus will work with Kapacitor. In addition, with InfluxDB’s native support for the Prometheus remote read and write protocol, you can use Prometheus to collect data and InfluxDB as your long-term, highly available, scalable data store.

Using Kapacitor to monitor Prometheus scrape targets allows for further streaming analytics, advanced anomaly detection, or the ability to add custom logic that gets triggered on the streaming data before storing it in the underlying data store.

The Telegraf operator project provides additional options for supplementing or replacing Prometheus monitoring solutions with the InfluxDB platform.

Prometheus Server Architecture

One of the core values and primary design objectives of Prometheus is simplicity. To achieve this, Prometheus focuses on a single-node architecture and enhances the server to achieve increased performance in this single-node infrastructure. Prometheus doesn’t have clustering in their roadmap, probably because of the additional complexity this would create, and this is against their design objective of simplicity.

PromQL

Prometheus Query Language (PromQL) is a functional query language that enables users to select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus’s expression browser, or consumed by external systems via HTTP API, for example, as an alert or notification.

Augmenting Prometheus for High Availability

InfluxData’s InfluxDB has a similar single-node approach with some similar design objectives, but InfluxDB Enterprise includes clustering to support environments that require high availability. Because InfluxDB Enterprise includes Kapacitor and Telegraf, you can maintain any investment in building Prometheus end-points, but you can store data on multiple nodes in a clustered InfluxDB Enterprise deployment. InfluxDB’s native scripting and querying language, Flux, supports PromQL to further reduce the headache of having to write queries to both data stores.

Prometheus data limitations

InfluxData’s InfluxDB has a similar single-node approach with some similar design objectives, but InfluxDB Enterprise includes clustering to support environments that require high availability. Because InfluxDB Enterprise includes Kapacitor and Telegraf, you can maintain any investment in building Prometheus end-points, but you can store data on multiple nodes in a clustered InfluxDB Enterprise deployment. InfluxDB’s native scripting and querying language, Flux, supports PromQL to further reduce the headache of having to write queries to both data stores.

Conclusion

Using Prometheus for monitoring is a good choice in Kubernetes environments that require pull-based monitoring and alerting of metrics. For environments that require monitoring or alerting of both metrics and events, or where high availability is a requirement, then consider augmenting your architecture to include InfluxData’s InfluxDB Enterprise or InfluxDB and Kapacitor. InfluxData will continue to enhance support for Prometheus going forward. To stay up-to-date with the latest developments, please follow the project on GitHub.

Resources

Video

Blogs

GitHub

Prometheus News

InfluxDb-cloud-logo

The most powerful time series
database as a service

Get Started for Free
Influxdbu

Developer Education

Training for time series app developers.

View All Education