TL;DR InfluxDB Tech Tips - Shard Group Duration Recommendations

Navigate to:

In this weekly post we recap the most interesting shard group duration recommendations and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed in the last week or so.

Continuous Queries with Latency

Q: I have a system that sends data to InfluxDB every minute. Because of how it’s set up, there could be up to one minute of latency before the data actually end up in the database. I’m working on creating a Continuous Query to calculate the sum of one of my fields at 30-second intervals. My worry is that because of the latency, the CQ won’t catch all of the data.

I’ve tried using an offset interval in the CQ but it doesn’t seem to be doing what I want. Any advice on using CQs on potentially latent data?

A: You’ll want to include a FOR clause in your Continuous Query. The FOR clause tells the CQ to recalculate prior intervals and pick up any newly-written points in those intervals.

CREATE CONTINUOUS QUERY latent_love
RESAMPLE EVERY 30S FOR 2m              <--- ?Resamples the previous 2 minutes worth of data?
BEGIN
  SELECT [...]
END

Note that the offset interval that you mentioned just moves the beginnings of the CQ’s time buckets; delaying the query is just a side effect. Check out the CQ docs for more on the FOR interval!

Shard Group Duration Recommendations

Q: I read the documentation on retention policies and I’m thinking about configuring my shard group durations. Is there anything I should avoid or need to be aware of when I embark on this journey?

A:  In general, shorter shard group durations allow the system to efficiently drop data. When InfluxDB enforces a retention policy (RP) it drops entire shard groups, not individual data points. For example, if your RP has a duration of one day, it makes sense to have a shard group duration of one hour; InfluxDB will drop an hour worth of data every hour.

If your RP’s duration is greater than six months, there’s no need to have a short shard group duration. In fact, increasing the shard group duration beyond the default seven day value can improve compression, improve write speed, and decrease the fixed iterator overhead per shard group. Shard group durations of 50 years and over, for example, are acceptable configurations.

We recommend configuring the shard group duration such that:

  • it is two times your longest typical query's time range
  • each shard group has at least 100,000 points per shard group
  • each shard group had at least 1,000 points per series

Retention Policies and the HTTP API

Q: I’m attempting to query data in a non-DEFAULT retention policy with the HTTP API. Currently, I’m using the rp query string parameter; is there any other way to do this?

A: Yes! You can fully qualify the measurement in the query’s FROM clause. Fully qualify a measurement by specifying its database and retention policy in the following format:

<database_name>.<retention_policy_name>.<measurement_name>

In the request below, the query specifies the geist measurement in the one_day retention policy in the spooky database:

curl -GET 'http://localhost:8086/query' --data-urlencode 'q=SELECT value  FROM spooky.one_day.geist'

For more InfluxDB tips, see our Frequently Asked Questions page and feel free to post your questions in the InfluxDB users group!

What's next?

  • Download and get started with InfluxDB!
  • Schedule a FREE 20 minute consultation with a Solutions Architect to review your InfluxDB project.
  • Attend one of our FREE virtual training seminars.
  • Got a question and need an immediate answer from the InfluxData Support team? Support subscriptions with unlimited incidents start at just $399 a month. Check out all the support options here.