Kubernetes is a container orchestration platform for containerized workloads (including, for example, Docker containers). Like any other automation frameworks, Kubernetes also relies on monitoring, but limited to specific checks and metrics to make orchestration decisions. It doesn’t monitor for performance and reliability of the whole application environment. Therefore, Kubernetes cannot be left unwatched risking ineffective or counter-productive automation measures that could lead to degradations and downtime.
As Kubernetes adoption grows with applications fragmented in microservices running on ephemeral containers, and most importantly, continuously integrated and delivered, monitoring Kubernetes for performance and reliability has become increasingly important. Prometheus monitoring does part of the job collecting metrics. But what happens when you want to monitor events in your Kubernetes clusters? And what about applications that don’t expose metrics in Prometheus format — that are better suited to other monitoring methods besides pulling, such as pushing and streaming? Furthermore, what about monitoring various data types, numeric and non-numeric, with different retention policies (months, years… forever)?
Push and pull metric collection mechanisms, stream ingestion, real-time analytics, high availability, and cost-effective long-term storage all matter when diving deeper into monitoring Kubernetes application environments. Also key are visibility and analytics on all monitoring and logging data — metrics, events and logs — from one pane for a complete and more insightful observation.
Most production environments don’t have a singular approach to application deployment and monitoring. Therefore, you should consider solutions that can handle variances, custom implementations, and business uniqueness, while facilitating the need for evolution over time.
In this technical paper, we discuss the importance of observing beyond nodes, containers and Prometheus monitoring, to include application and K8s events, states — and very importantly — custom instrumentations and logs. What data collection methods can be used? What Kubernetes dashboards could be created to leverage the different types of data collected? Where could scalability bottlenecks rise in a monitoring solution? And why does the InfluxDB open source time series platform provide the necessary platform foundation to address all monitoring and scalability needs?