Telegraf Plugins used:
- 2 Labels:
- 2 Dashboards:
Kubernetes Node Metrics,
- 1 Variables:
- 1 Telegraf Config:
If you have your InfluxDB credentials configured in the CLI, you can install this template with:
Kubernetes Monitoring Dashboard
Kubernetes Inventory Dashboard
Kubernetes is an open source container-orchestration system for automating computer application deployment, scaling and management. Originally designed by Google, and maintained by the Cloud Native Computing Foundation, Kubernetes is used by millions to eliminate the manual steps involved in deploying and scaling containerized applications.
Kubernetes is a platform that was expressly built for automating the types of Linux container operations that enterprises are working through on a daily basis. It helps to eliminate many of the important but time-consuming manual processes involved in deploying and scaling containerized applications, thus freeing up the valuable time of human employees to focus on those tasks that really need them.
This brings with it a wide array of benefits like faster-than-ever deployment speeds, superior workload portability, and providing an excellent fit with the DevOps way of development. It's a great opportunity to simplify the provisioning process for developers who may be pressed for time, and it's a way to open up a number of distinct possibilities that otherwise wouldn't be realistic.
Why monitor Kubernetes
Monitoring Kubernetes ensures that Kubernetes clusters utilize underlying resources efficiently. It also requires active management and continuous optimization based on historical data. This can be accomplished by collecting and analyzing Kubernetes resource metrics to identify idle resources, dynamically optimize infrastructure footprint, and cut waste. CPU and memory are the two types of resources that are consumed by containers and can be managed using resource requests and resource limits. Limits are the maximum amount that is consumed by the containers, and resource requests are the minimum amount requested.
All of this is important because properly monitoring the current state of an application is one of the best ways to 1) anticipate problems, and 2) discover potential bottlenecks in any type of fast-paced production environment. This is especially important if you're working with a large number of applications that are diversified in nature but that are communicating with one another all the time. Any single point of failure at any point in the chain can stop the entire process in its tracks, and Kubernetes monitoring is one of the best ways available to you to prevent this from happening no matter what.
The Kubernetes Monitoring Template provides 2 basic Kubernetes dashboards:
Kubernetes Node Metrics and
Kubernetes Inventory. The K8S infrastructure supports Google Cloud Platform, AWS and on-premise K8S environments.
How to use the Kubernetes Monitoring Template
Once your InfluxDB credentials have been properly configured in the CLI, you can install the Kubernetes Monitoring Template using the Quick Install command. Once installed, the data for the dashboard will be populated by the included Telegraf configurations, which include the relevant Kubernetes Input. Note that you might need to customize the input configuration to better serve your needs, including by specifying a new input value. All of this will depend on how your organization is currently running Kubernetes.
To find out more information about environmental variables within the Telegraf configuration, consult the following link.
Also, dig deep into how to monitor Kubernetes with InfluxDB using lessons learned from building and running InfluxDB Cloud on Kubernetes. In this video from InfluxDays London, Gianluca Arbezzano covers what metrics should be collected, when to use push and pull metric collection, and the role that Prometheus plays in any K8s monitoring environment.
Key Kubernetes monitoring metrics to monitor
Some of the most important Kubernetes monitoring metrics that you should proactively monitor include:
- Current and Past Nodes
- CPU Metrics
- Node Count
- Pod Capacity
- Host Count
- Allocatable CPU cores
- Average CPU capacity
- Running Containers
- Nodes CPU usage
- Daemon Sets