TL;DR InfluxDB Tech Tips – June 16, 2016

Regan Kuchan — June 16, 2016

In this post we recap the week’s most interesting InfluxDB and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed.

Working with Data Across Measurements

Q: I have two measurements: `cow_groups` and `cow_activity`. I want to `GROUP BY` a tag in `cow_groups` and calculate the average of a field in `cow_activity`.

This is how my data are organized:

> SELECT * FROM cow_groups
name: cow_groups
time                    cow_group_id           cow_id    value
2016-06-12T23:10:00Z    consistent producer    3         1
2016-06-12T23:10:00Z    food motivated         2         1
2016-06-12T23:10:00Z    food motivated         1         1
2016-06-12T23:10:00Z    cool kids              1         1

> SELECT * FROM cow_activity
name: cow_activity
time                    cow_id   grass_consump  milk_prod
2016-06-12T23:10:00Z    1        10             60
2016-06-12T23:10:00Z    3        9              70
2016-06-12T23:10:00Z    2        7              71

I know InfluxDB doesn’t do `JOIN`s but is there a way to perform something like the following query?

SELECT mean("milk_prod") FROM "cow_activity" GROUP BY "cow_groups"."cow_group_id"

A: You are correct. InfluxDB 0.9+ doesn’t support `JOIN`s. Instead, all series in a given measurement are automatically merged unless distinguished by explicit filters in the `WHERE` clause or the `GROUP BY` clause.

There is no way to perform cross-measurement math or grouping. All data must be under a single measurement to query it together. We recommend re-organizing your schema so that all of your data live in one measurement. InfluxDB is not a relational database and mapping data across measurements is not a great schema.

Recommended schema (`cow_group_id` and `cow_id` are tags):

> SELECT * FROM "cows"
name: cows
time                  cow_group_id         cow_id  grass_consump   milk_prod
2016-06-12T23:10:00Z  consistent producer  3	   9		   70
2016-06-12T23:10:00Z  food motivated	   2	   7               71
2016-06-12T23:10:00Z  food motivated	   1	   10              60
2016-06-12T23:10:00Z  cool kids            1	   10              60

# That schema makes this query possible:

> SELECT mean("milk_prod") FROM "cows" GROUP BY "cow_group_id"
name: cows
tags: cow_group_id=consistent producer
time                  mean
----                  ----
1970-01-01T00:00:00Z  70

name: cows
tags: cow_group_id=cool kids
time                  mean
----                  ----
1970-01-01T00:00:00Z  60

name: cows
tags: cow_group_id=food motivated
time                  mean
----                  ----
1970-01-01T00:00:00Z  65.5


Querying Tag Values

Q: I successfully wrote several points to the database and a simple `SELECT *` query works, but when I specify a tag in the `WHERE` clause I get no results. What am I missing?

> INSERT farm,patch_id=1 blueberries=15 1465948800000000000
> INSERT farm,patch_id=2 blueberries=32 1465948800000000000
> INSERT farm,patch_id=1 blueberries=12 1465952400000000000
> SELECT * FROM "farm" 
name: farm
time                  blueberries  patch_id
2016-06-15T00:00:00Z  15           1
2016-06-15T00:00:00Z  32           2
2016-06-15T01:00:00Z  12           1
> SELECT * FROM "farm" WHERE time <= now() + 5h and "patch_id"=1
> SELECT * FROM "farm" WHERE time <= now() + 5h and "patch_id"="1"

A: Use single quotes to specify a tag value in the `WHERE` clause:

> SELECT * FROM "farm" WHERE "patch_id"='1'
name: farm
time                  blueberries	patch_id
2016-06-15T00:00:00Z  15                1
2016-06-15T01:00:00Z  12                1

Understanding Timestamps Returned by GROUP BY time() queries

Q: My query has a 20 minute `GROUP BY time()` interval and specifies a 20 minute time range in the `WHERE` clause. Interestingly, I get results for two time intervals instead of just one. What’s going on?

> SELECT max("cookies") FROM "my_kitchen" WHERE time >= '2016-06-14T15:30:00Z' and time <= '2016-06-14T15:40:00Z' GROUP BY time(20m)
name: my_kitchen
time                  max
2016-06-14T15:20:00Z  20
2016-06-14T15:40:00Z  18

A: You’re getting two results because of the way InfluxDB handles time boundaries with `GROUP BY time()` queries.

By default, InfluxDB returns rounded calendar time boundaries. In your case InfluxDB automatically groups together `2016-06-14T15:20:00Z` and `2016-06-14T15:30:00Z` as the first 20 minute interval and `2016-06-14T15:40:00Z` and `2016-06-14T15:50:00Z` as the next 20 minute interval. Note that the results in the `max` column only consider data that fall within the time range in the `WHERE` clause.

You can alter the default rounded calendar time boundaries by including an offset interval in your `GROUP BY time()` clause. The following query tells InfluxDB to add ten minutes to the default time boundaries so that the results begin at `2016-06-14T15:30:00Z` instead of `2016-06-14T15:20:00Z`:

> SELECT max("cookies") FROM "my_kitchen" WHERE time >= '2016-06-14T15:30:00Z' and time <= '2016-06-14T15:40:00Z' GROUP BY time(20m,10m)
name: my_kitchen
time                  max
2016-06-14T15:30:00Z  20

Check out Data Exploration for documentation on the offset interval.

For more InfluxDB tips, check out our Frequently Asked Questions page and feel free to post your questions in the InfluxDB users group.

What’s next?