TL;DR InfluxDB Tech Tips - Creating A Subsample With A Continuous Query

Navigate to:

In this weekly post we recap the most interesting InfluxDB and TICK-stack related issues, how to create a subsample with a continuous query, workarounds, and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed in the last week or so.

Changing a field's data type

Q: Can I change field’s data type?

A: Currently, InfluxDB offers very limited support for changing a field’s data type.

The <field_key>::<type> syntax supports casting field values from integers to floats or from floats to integers. See Cast Operations for an example. There is no way to cast a float or integer to a string or boolean (or vice versa).

There are a couple workarounds if the cast operations are too limited for your use case. Note that these workarounds will not update data that have already been written to the database.

Option 1: Write the Data to a Different Field

The simplest workaround is to begin writing the new data type to a different field in the same series.

Option 2: Work the Shard System

Field value types cannot differ within a shard but they can differ across shards.

Users looking to change a field’s data type can use the SHOW SHARDS query to identify the end_time of the current shard. InfluxDB will accept writes with a different data type to an existing field if the point has a timestamp that occurs after that end_time.

Note that this will not change the field’s data type on prior shards. For how this will affect your queries, please see how does InfluxDB handle field type discrepancies across shards.

Interpreting the FOR duration error

Q: I’m trying to create a Continuous Query and I keep getting an error (see below). It says my FOR duration must be greater than or equal to my GROUP BY time() duration - but my FOR duration is greater than my GROUP BY time() duration! Any advice?

Continuous Query:

CREATE CONTINUOUS QUERY "max_pumpkins" ON "sales"
RESAMPLE EVERY 3h FOR 2h
BEGIN
SELECT max("pumpkins") INTO "max_pumpkins" FROM "patch" GROUP BY time(1h)
END

Error:

error parsing query: FOR duration must be >= GROUP BY time duration

A: Currently, InfluxDB returns that error whenever the FOR duration is less than the execution interval . In your case, the execution interval is determined by the EVERY duration, not the GROUP BY time() duration (so, technically, the error should say FOR duration must be >= EVERY duration).

Beyond the misleading error wording, the reason you’re getting the error is because your Continuous Query would miss data between execution times. If max_pumpkins were to run as is, it would execute a query every three hours (that’s your EVERY duration). That query would cover the time between now() and now() minus two hours (that’s your FOR duration) - you’d end up not having results for every third hour. To avoid gaps in data coverage, you’ll need to increase your FOR duration to 3h.

If this configuration was intentional, there is an open feature request for Continuous Queries to support gaps in data coverage.

Creating a subsample with a Continuous Query

Q: I want to use a Continuous Query to move data with a particular tag from one measurement to another measurement and retention policy. I don’t want to manipulate the data in any way, but it looks like Continuous Queries require an aggregation function. Is there a way around this?

Raw data (where zombies is a tag):

> SELECT * FROM "mydb"."two_days"."haunted_house"
name: haunted_house
-------------------
time                   scare_index   zombies
2016-10-05T20:00:00Z   3             false
2016-10-05T20:00:00Z   12            true
2016-10-05T20:10:00Z   6             false
2016-10-05T20:10:00Z   13            true
2016-10-05T20:20:00Z   2             false
2016-10-05T20:20:00Z   11            true

What I’d like in a different retention policy/measurement:

> SELECT * FROM "mydb"."seven_days"."zombie_effects"
name: haunted_house
-------------------
time                   scare_index   zombies
2016-10-05T20:00:00Z   12            true
2016-10-05T20:10:00Z   13            true
2016-10-05T20:20:00Z   11            true

A: Continuous Queries do require a function but not necessarily an aggregator function. You could use a selector function to satisfy the function requirement and not change the data in any way.

Assuming your data will always occur at ten-second intervals, the following CQ will give you your subsample:

CREATE CONTINUOUS QUERY "true_zombie" ON "mydb"
RESAMPLE EVERY 20s FOR 20s
BEGIN
  SELECT first("scare_index")
  INTO "mydb"."seven_days"."zombie_effects"
  FROM "mydb"."two_days"."haunted_house"
  WHERE "zombies" = 'true' 
  GROUP BY time(10s),*
END

For more InfluxDB tips, see our Frequently Asked Questions page and feel free to post your questions in the InfluxDB users group!

What's next?

  • Download and get started with InfluxDB!
  • Schedule a FREE 20 minute consultation with a Solutions Architect to review your InfluxDB project.
  • Attend one of our FREE virtual training seminars.
  • Got a question and need an immediate answer from the InfluxData Support team? Support subscriptions with unlimited incidents start at just $399 a month. Check out all the support options here.