When trying to answer questions about what is impacting the performance of your servers or determining what VM size to choose, one of the first set of bottlenecks you check are memorey, disk, and CPU. CPU metrics in particular are helpful to determine if the CPU is constantly being maxed out. Key metrics that can help uncover the issue behind the CPU being maxed will paint a picture of who is using the CPU (application, processes, or OS itself); which processes are demanding more cycles than the CPU can provide; whether the CPU is waiting for operations to complete; or whether it is doing no work at all.
Why use a Telegraf plugin for CPU?
Typically, when you are tracking metrics about CPU performance, you do this by collecting and reviewing memory and disk usage as well. The CPU Telegraf plugin gathers metrics on the system CPU that you can store in InfluxDB. You can also easily add memory, disk and a whole host of other metrics with the various Telegraf plugins to help paint a complete picture of your environment’s performance. Using Telegraf opens the door for lots of varied use cases.
How to monitor CPU performance using the CPU Telegraf Plugin
Configuring the CPU Telegraf Plugin is simple as there are only a handful of configurations to set. They include whether to report per CPU stats or not, whether to report total system CPU stats or not, collect raw CPU time metrics, and then compute and report on the sum of all non-idle CPU states. Once you have this set up, you can point Telegraf to your InfluxDB instances and start collecting and reporting on these metrics.
Key CPU metrics to use for monitoring
Some of the important CPU metrics that you should proactively monitor include:
- CPU ID or
- CPU ID or
- time_user (float)
- time_system (float)
- time_idle (float)
- time_active (float)
- time_nice (float)
- time_iowait (float)
- time_irq (float)
- time_softirq (float)
- time_steal (float)
- time_guest (float)
- time_guest_nice (float)
- usage_user (float, percent)
- usage_system (float, percent)
- usage_idle (float, percent)
- usage_active (float)
- usage_nice (float, percent)
- usage_iowait (float, percent)
- usage_irq (float, percent)
- usage_softirq (float, percent)
- usage_steal (float, percent)
- usage_guest (float, percent)
- usage_guest_nice (float, percent)