How to Time Your Data Collection with Telegraf Agent Settings
Samantha Wang /
Use Cases, Product, Developer
Apr 12, 2022
Many Telegraf and InfluxDB users often spend a lot of time finding that perfect balance of getting the data they want in while not writing in too much data that they have to deal with unnecessary data in their database. This blog post will give you a better understanding of Telegraf’s data collection settings and help you fine-tune your configuration.
You can use this post as a guide to help you easily set when and how often you want Telegraf to collect data from your device, helping you overcome endpoints that may have a constraint on the number of queries that can be performed.
This post will only cover the settings for data received by Telegraf. In another post, I’ll cover how data gets transmitted and sent out from Telegraf.
Set how often you want to gather metrics with
Interval is the setting to determine how often Telegraf should gather metrics and is the data collection interval for all inputs. The more often you want to gather your metrics, the lower you should configure your
The interval setting can be set at the agent or plugin level. You would use it at the plugin level if one input should be run less or more often than others.
The default interval setting of 10 seconds collects your data every 10 seconds. If you wanted your data to collect twice as often, you would lower the setting to
interval = "5s" for it to run at the 5, 10, 15, 20 and so on of every minute. The interval setting is rather straightforward but is also one of the most important configurations for the behavior of your data collection.
In the above graphic with these settings of
interval = "10m" and
round_interval = true, Telegraf collects data every 10 minutes on the rounded tens.
round_interval determines if Telegraf will round collection interval
round_interval setting determines if Telegraf will round the collection interval. This setting will be set as boolean (
false). In the previous example when
round_interval was set to true, Telegraf would collect on the rounded 10 minutes. If the interval is set to “5m”, and
round_interval is enabled then Telegraf will collect at :05, :10, :15, :20, etc. Timestamps are especially crucial when working with time series data. The round interval setting helps ensure your timestamps from all sources are aligned and uniform across all your Telegraf agents.
However if we change
false, Telegraf collects right when Telegraf starts and then continues at whatever
interval is set. In the example below, we’ll set our
round_interval = false and keep the same
interval = "10m". If we start Telegraf at the 7th minute of the hour, Telegraf will immediately gather metrics and then continue collecting every 10 minutes on the 17, 27, 37, 47, 57 minutes of the hour.
Offset collection by a random amount using
If a user has a large number of input metrics configured, adding some randomness to the collection process can help ensure targets do not get overloaded. You can use the
collection_jitter for this exact use case to jitter the collection by a random amount. This setting can be used on the agent or input plugin level. If used at the plugin level, each plugin will sleep for a random time within the set jitter before collecting.
The range used for collection is between the interval and the interval plus or minus the jitter amount. So if we keep our
10m and set
collection_jitter = "2m", then each input will fire off randomly between the 8-12 minute intervals.
Precisely schedule collection with
The collection offset setting is the new agent setting that was added in Telegraf 1.22. The
collection_offset setting is used to shift the collection by a given interval. This setting allows you to avoid many plugins querying constrained devices at the same time by manually scheduling them in time, as opposed to
collection_jitter that can enable you to randomly offset in time.
If you have remote data that gets updated at the same time in the middle of the night, you can use
collection_offset to schedule the collection precisely after it gets uploaded and continue doing so on a daily cadence.
In this example, we will maintain the
interval = "10m" setting, return
collection_jitter to default, and set
collection_offset = "6m". No matter what time we start Telegraf, we will collect data on the 6, 16, 26, 36, 46, 56 minute of every hour.
Also as opposed to collection jitter, collection offset allows you to maintain a consistent and exact interval that Telegraf is collecting on. Collection offset provides you controlled data collection for predictable timestamps.
precision of your timestamp
precision setting doesn’t necessarily control how your data gets collected but it’s important viewing the granularity of that timed data. With the
precision setting, you can set the precision of your timestamps by specifying the integer and unit (ex:
1s). Your precision can be set anywhere from seconds down to nanoseconds. It is important to note that precision will not be used for service input plugins such as statsd.
|Example timestamp in unix format
Check out this video from Josh Powers where he discusses some of these settings along with more Telegraf agent settings.
If you have any questions on how to use these data collection settings in your Telegraf configuration, feel free to reach out on our community site or Slack. If you want to dive in deeper with Telegraf, InfluxDB University provides a full list of courses covering Telegraf, InfluxDB, Flux and so much more.