TL;DR InfluxDB Tech Tips - Querying Tag Value, Data Across Measurements, Understanding Timestamps

Navigate to:

In this post we recap the week’s most interesting InfluxDB and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group that you might have missed.

Working with Data Across Measurements

Q: I have two measurements: cow_groups and cow_activity. I want to GROUP BY a tag in cow_groups and calculate the average of a field in cow_activity.

This is how my data are organized:

> SELECT * FROM cow_groups
name: cow_groups
----------------
time                    cow_group_id           cow_id    value
2016-06-12T23:10:00Z    consistent producer    3         1
2016-06-12T23:10:00Z    food motivated         2         1
2016-06-12T23:10:00Z    food motivated         1         1
2016-06-12T23:10:00Z    cool kids              1         1

> SELECT * FROM cow_activity
name: cow_activity
------------------
time                    cow_id   grass_consump  milk_prod
2016-06-12T23:10:00Z    1        10             60
2016-06-12T23:10:00Z    3        9              70
2016-06-12T23:10:00Z    2        7              71

I know InfluxDB doesn’t do JOINs but is there a way to perform something like the following query?

SELECT mean("milk_prod") FROM "cow_activity" GROUP BY "cow_groups"."cow_group_id"

A: You are correct. InfluxDB 0.9+ doesn’t support JOINs. Instead, all series in a given measurement are automatically merged unless distinguished by explicit filters in the WHERE clause or the GROUP BY clause.

There is no way to perform cross-measurement math or grouping. All data must be under a single measurement to query it together. We recommend re-organizing your schema so that all of your data live in one measurement. InfluxDB is not a relational database and mapping data across measurements is not a great schema.

Recommended schema (cow_group_id and cow_id are tags):

> SELECT * FROM "cows"
name: cows
----------
time                  cow_group_id         cow_id  grass_consump   milk_prod
2016-06-12T23:10:00Z  consistent producer  3	   9		   70
2016-06-12T23:10:00Z  food motivated	   2	   7               71
2016-06-12T23:10:00Z  food motivated	   1	   10              60
2016-06-12T23:10:00Z  cool kids            1	   10              60

# That schema makes this query possible:

> SELECT mean("milk_prod") FROM "cows" GROUP BY "cow_group_id"
name: cows
tags: cow_group_id=consistent producer
time                  mean
----                  ----
1970-01-01T00:00:00Z  70

name: cows
tags: cow_group_id=cool kids
time                  mean
----                  ----
1970-01-01T00:00:00Z  60

name: cows
tags: cow_group_id=food motivated
time                  mean
----                  ----
1970-01-01T00:00:00Z  65.5

 

Querying Tag Values

Q: I successfully wrote several points to the database and a simple SELECT * query works, but when I specify a tag in the WHERE clause I get no results. What am I missing?

> INSERT farm,patch_id=1 blueberries=15 1465948800000000000
> INSERT farm,patch_id=2 blueberries=32 1465948800000000000
> INSERT farm,patch_id=1 blueberries=12 1465952400000000000
> SELECT * FROM "farm" 
name: farm
----------
time                  blueberries  patch_id
2016-06-15T00:00:00Z  15           1
2016-06-15T00:00:00Z  32           2
2016-06-15T01:00:00Z  12           1
> SELECT * FROM "farm" WHERE time <= now() + 5h and "patch_id"=1
> SELECT * FROM "farm" WHERE time <= now() + 5h and "patch_id"="1"
>

A: Use single quotes to specify a tag value in the WHERE clause:

> SELECT * FROM "farm" WHERE "patch_id"='1'
name: farm
----------
time                  blueberries	patch_id
2016-06-15T00:00:00Z  15                1
2016-06-15T01:00:00Z  12                1

Understanding Timestamps Returned by GROUP BY time() queries

Q: My query has a 20 minute GROUP BY time() interval and specifies a 20 minute time range in the WHERE clause. Interestingly, I get results for two time intervals instead of just one. What’s going on?

> SELECT max("cookies") FROM "my_kitchen" WHERE time >= '2016-06-14T15:30:00Z' and time <= '2016-06-14T15:40:00Z' GROUP BY time(20m)
name: my_kitchen
----------------
time                  max
2016-06-14T15:20:00Z  20
2016-06-14T15:40:00Z  18

A: You’re getting two results because of the way InfluxDB handles time boundaries with GROUP BY time() queries.

By default, InfluxDB returns rounded calendar time boundaries. In your case InfluxDB automatically groups together 2016-06-14T15:20:00Z and 2016-06-14T15:30:00Z as the first 20 minute interval and 2016-06-14T15:40:00Z and 2016-06-14T15:50:00Z as the next 20 minute interval. Note that the results in the max column only consider data that fall within the time range in the WHERE clause.

You can alter the default rounded calendar time boundaries by including an offset interval in your GROUP BY time() clause. The following query tells InfluxDB to add ten minutes to the default time boundaries so that the results begin at 2016-06-14T15:30:00Z instead of 2016-06-14T15:20:00Z:

> SELECT max("cookies") FROM "my_kitchen" WHERE time >= '2016-06-14T15:30:00Z' and time <= '2016-06-14T15:40:00Z' GROUP BY time(20m,10m)
name: my_kitchen
----------------
time                  max
2016-06-14T15:30:00Z  20

Check out Data Exploration for documentation on the offset interval.

For more InfluxDB tips, check out our Frequently Asked Questions page and feel free to post your questions in the InfluxDB users group.

What's next?