Chrony is an implementation of the Network Time Protocol (NTP) for keeping computers synchronized. It is a replacement for the ntpd, a reference implementation of the NTP and was designed to synchronize time during intermittent network connections (e.g. laptops that might not run constantly), congested networks, or when clock speeds vary when loads are low. Time keeping is important, especially so for users of time series databases, who rely on having the transactions they are tracking be in the proper order and correct time sequence. This is easier said than done, since time series data collected could be on geographically dispersed hosts when exact times are kept on the computers in question.
Accuracy between machines synchronized over the Internet is within a few milliseconds; on a LAN, in tens of microseconds. With hardware timestamping, or a hardware reference clock, sub-microsecond accuracy may be possible and could be important when trying to determine the reason for particular issues.
Why use a Telegraf plugin for Chrony?
Regardless of your use case (DevOps, IoT), if you are collecting time series data from a geographically dispersed device, you may need assurance that the timestamps are accurate. By using Chrony and the Chrony Telegraf Plugin, you can reduce your system clock's time drift by collecting these metrics in InfluxDB and building dashboards. Dashboards are an easy way to see when drift looks like it could be a problem.
How to monitor Chrony using the Telegraf plugin
To get standard Chrony metrics, using the chronyc executable is required. The following list are all the headers that can be returned.
- Reference ID - This is the ref ID and name of the server which the computer is currently synchronized. If the Reference ID is 127.127.1.1, then the computer is not synchronized to any external source and you are operating in ‘local’ mode.
- Stratum - The stratum indicates how many hops away the reference clock is from the computer.
- Ref time - This is the UTC time the last measurement from the reference source was processed.
- System time - Any error in the system clock is corrected by slightly speeding up or slowing down the system clock until the error has been removed, and then returning to the system clock’s normal speed. There will be a period when the system clock will be different from chronyd's estimate of the current true time, and the value reported on this line is the difference due to this effect.
- Last offset - This is the estimated local offset on the last clock update.
- RMS offset - This is a long-term average of the offset value.
- Frequency - The ‘frequency’ is the rate where the system’s clock would be wrong if chronyd was not correcting it. It is expressed in ppm (parts per million).
- Residual freq - The ‘residual frequency’ for the currently selected reference source. This reflects any difference between what the measurements from the reference source indicate the frequency should be and the frequency currently being used.
- Skew - This is the estimated error bound on the frequency.
- Root delay - This network path delays to the stratum-1 computer from which the computer is ultimately synchronized.
- Root dispersion - This is the total dispersion accumulated through all the computers back to the stratum-1 computer from which the computer is ultimately synchronized. Dispersion is due to system clock resolution, statistical measurement variations etc.
- Leap status - This is the leap status, which can be Normal, Insert second, Delete second or Not synchronized.
Key Chrony metrics to use for monitoring
Some of the important Chrony metrics that you should proactively include in your monitoring include: