TL;DR InfluxDB Tech Tips - InfluxDB Kapacitor

Navigate to:

In this post we recap the week’s most interesting InfluxDB Kapacitor and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed.

Preserving tags with INTO queries

Q: I’m using an INTO query to move data to a new measurement. InfluxDB says it wrote the correct number of points, but when I query the data in the new measurement, I’m missing a point.

In my data, color is a tag and name is a field.

> SELECT * FROM "french_bulldogs"
name: french_bulldogs
---------------------
time                  color  name
2016-05-25T00:05:00Z  peach  nugget
2016-05-25T00:05:00Z  grey   rumple
2016-05-25T00:10:00Z  black  prince

> SELECT * INTO "all_dogs" FROM "french_bulldogs"
name: result
------------
time                  written
1970-01-01T00:00:00Z  3

> SELECT * FROM "all_dogs"
name: all_dogs
--------------
time                  color  name
2016-05-25T00:05:00Z  grey   rumple
2016-05-25T00:10:00Z  black  prince

What happened to nugget?

A: If you use SELECT * with INTO, InfluxDB converts any tags in the current measurement to fields in the new measurement. This can cause InfluxDB to overwrite points that were previously differentiated by a tag.

In your case, InfluxDB converts color from a tag to a field when it writes data to all_dogs. Because nugget and rumple have the same timestamp and are no longer differentiated by a tag, InfluxDB overwrites the nugget point with the rumple point.

Use GROUP BY * to preserve tags as tags with INTO queries:

> SELECT "name" INTO "all_dogs" FROM "french_bulldogs" GROUP BY *
name: result
------------
time                  written
1970-01-01T00:00:00Z  3

> SELECT * FROM "all_dogs"
name: all_dogs
--------------
time                  color  name
2016-05-25T00:05:00Z  peach  nugget
2016-05-25T00:05:00Z  grey   rumple
2016-05-25T00:10:00Z  black  prince

To stream or not to stream with Kapacitor

Q: I’m using Kapacitor to join stream data with shifted stream data and I write the joined data back to InfluxDB. Will this work if the shift covers a long time range (for example, 24 hours)? Can Kapacitor handle that amount of data in RAM?

var current_data = stream
   | from()
     .measurement('sparkle_trees')

var previous_data = stream
   | from()
     .measurement('sparkle_trees')
   | shift(24h)

current_data
   |join(previous_data)
    .tolerance(10s)
    .as('current_data','previous_data')
   |influxDBOut()
    .database('enchanted_forest')
    .retentionPolicy('default')
    .measurement('joined_data')

A: Your TICKscript will work but, in practice, it’s not a good idea. First, every time you restart Kapacitor you will have to wait 24 hours before it does any joining. Second, you will have to buffer 24 hours worth of data in RAM; this is impractical unless you have a very small ingestion rate.

We recommend using a batch task instead of a stream task. With a batch task, Kapacitor can simply query old data and start immediately processing the join when Kapacitor first starts up. In addition, Kapacitor only needs to fit data for a single batch in RAM.

For example:

var current_data = batch
   | query('SELECT * FROM "enchanted_forest"."default"."sparkle_trees"')
     .period(5m)
     .every(5m)

var previous_data = batch
   | query('SELECT * FROM "enchanted_forest"."default"."sparkle_trees"')
     .period(5m)
     .every(5m)
     // Select data -24h from the current time
     .offset(24h)
   // Shift data to match current time
   | shift(24h)

current_data
   |join(previous_data)
    .as('current_data','previous_data')
   |influxDBOut()
    .database('enchanted_forest')
    .retentionPolicy('default')
    .measurement('joined_data')

When to double quote identifiers in queries

Q: I’ve successfully written data to InfluxDB, but I get an unexpected error when I query my data:

> INSERT monsters exists=true
> SELECT exists FROM monsters
ERR: error parsing query: found EXISTS, expected identifier, string, number, bool at line 1, char 8

A: The field key exists is an InfluxQL keyword. Identifiers that are also InfluxQL keywords must be double-quoted in queries:

> SELECT "exists" FROM monsters
name: monsters
--------------
time				             exists
2016-05-25T00:36:01.714188435Z   true
2016-05-25T00:36:19.30227374Z	 true

See our docs for a complete list of InfluxQL keywords.

For more InfluxDB tips, check out our Frequently Encountered Issues page and feel free to post your questions in the InfluxDB users group.

What's next?