TL;DR InfluxDB Tech Tips - Creating A Subsample With A Continuous Query
Regan Kuchan /
Oct 06, 2016
In this weekly post we recap the most interesting InfluxDB and TICK-stack related issues, how to create a subsample with a continuous query, workarounds, and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed in the last week or so.
Changing a field's data type
Q: Can I change field’s data type?
A: Currently, InfluxDB offers very limited support for changing a field’s data type.
<field_key>::<type> syntax supports casting field values from integers to floats or from floats to integers. See Cast Operations for an example. There is no way to cast a float or integer to a string or boolean (or vice versa).
There are a couple workarounds if the cast operations are too limited for your use case. Note that these workarounds will not update data that have already been written to the database.
Option 1: Write the Data to a Different Field
The simplest workaround is to begin writing the new data type to a different field in the same series.
Option 2: Work the Shard System
Field value types cannot differ within a shard but they can differ across shards.
Users looking to change a field’s data type can use the
SHOW SHARDS query to identify the
end_time of the current shard. InfluxDB will accept writes with a different data type to an existing field if the point has a timestamp that occurs after that
Note that this will not change the field’s data type on prior shards. For how this will affect your queries, please see how does InfluxDB handle field type discrepancies across shards.
FOR duration error
Q: I’m trying to create a Continuous Query and I keep getting an error (see below). It says my
FOR duration must be greater than or equal to my
GROUP BY time() duration - but my
FOR duration is greater than my
GROUP BY time() duration! Any advice?
CREATE CONTINUOUS QUERY "max_pumpkins" ON "sales" RESAMPLE EVERY 3h FOR 2h BEGIN SELECT max("pumpkins") INTO "max_pumpkins" FROM "patch" GROUP BY time(1h) END
error parsing query: FOR duration must be >= GROUP BY time duration
A: Currently, InfluxDB returns that error whenever the
FOR duration is less than the execution interval . In your case, the execution interval is determined by the
EVERY duration, not the
GROUP BY time() duration (so, technically, the error should say
FOR duration must be >= EVERY duration).
Beyond the misleading error wording, the reason you’re getting the error is because your Continuous Query would miss data between execution times. If
max_pumpkins were to run as is, it would execute a query every three hours (that’s your
EVERY duration). That query would cover the time between
now() minus two hours (that’s your
FOR duration) - you’d end up not having results for every third hour. To avoid gaps in data coverage, you’ll need to increase your
FOR duration to
If this configuration was intentional, there is an open feature request for Continuous Queries to support gaps in data coverage.
Creating a subsample with a Continuous Query
Q: I want to use a Continuous Query to move data with a particular tag from one measurement to another measurement and retention policy. I don’t want to manipulate the data in any way, but it looks like Continuous Queries require an aggregation function. Is there a way around this?
Raw data (where
zombies is a tag):
> SELECT * FROM "mydb"."two_days"."haunted_house" name: haunted_house ------------------- time scare_index zombies 2016-10-05T20:00:00Z 3 false 2016-10-05T20:00:00Z 12 true 2016-10-05T20:10:00Z 6 false 2016-10-05T20:10:00Z 13 true 2016-10-05T20:20:00Z 2 false 2016-10-05T20:20:00Z 11 true
What I’d like in a different retention policy/measurement:
> SELECT * FROM "mydb"."seven_days"."zombie_effects" name: haunted_house ------------------- time scare_index zombies 2016-10-05T20:00:00Z 12 true 2016-10-05T20:10:00Z 13 true 2016-10-05T20:20:00Z 11 true
A: Continuous Queries do require a function but not necessarily an aggregator function. You could use a selector function to satisfy the function requirement and not change the data in any way.
Assuming your data will always occur at ten-second intervals, the following CQ will give you your subsample:
CREATE CONTINUOUS QUERY "true_zombie" ON "mydb" RESAMPLE EVERY 20s FOR 20s BEGIN SELECT first("scare_index") INTO "mydb"."seven_days"."zombie_effects" FROM "mydb"."two_days"."haunted_house" WHERE "zombies" = 'true' GROUP BY time(10s),* END
- Download and get started with InfluxDB!
- Schedule a FREE 20 minute consultation with a Solutions Architect to review your InfluxDB project.
- Attend one of our FREE virtual training seminars.
- Got a question and need an immediate answer from the InfluxData Support team? Support subscriptions with unlimited incidents start at just $399 a month. Check out all the support options here.