TL;DR InfluxDB Tech Tips - All About the New Subqueries!

Navigate to:

In this post we introduce subqueries; a new InfluxQL feature in InfluxDB version 1.2. Check in next week for more on the most interesting InfluxDB and TICK-stack related issues, workarounds, how-tos and Q&A from GitHub, IRC and the InfluxDB Google Group.

What is a subquery?

Subqueries are a new feature in InfluxDB version 1.2. They increase the functionality of your InfluxQL queries and allow you to gain more granular, meaningful insights into your data.

A subquery is a query that’s nested in the FROM clause of another query. You can think of a subquery as a query that serves as a condition in a separate query. The syntax is pretty straightforward:

SELECT_clause FROM (SELECT_statement) [...]

where the subquery is the SELECT_statement that appears inside parentheses in the FROM clause of the main query, and the main query is everything outside of those parentheses. Both the subquery and the main query support every InfluxQL clause. When the system executes a query, it runs the subquery and then the main query on the results of that subquery.

When would I use a subquery?

Perform functions on functions on functions

Execute a function on the results of another function. This use case is equivalent to nested functions. The following query calculates the average value of passengers for every subway tag and returns only the maximum average value:

> SELECT MAX("mean") FROM (SELECT MEAN("passengers") FROM "schedule" GROUP BY "subway")

name: schedule
time                  max
----                  ---
1970-01-01T00:00:00Z  32.75

Notice that the SELECT clause in the main query specifies mean as the field key in the MAX() function. To understand how InfluxDB calculates those results, we’ve included the (annotated) output from just the subquery below:

> SELECT MEAN("passengers") FROM "schedule" GROUP BY "subway"

name: schedule
tags: subway=U2
time                  mean   <---- The relevant field key for the main query
----                  ----
1970-01-01T00:00:00Z  32.75  <---- This is the max value that gets returned

name: schedule
tags: subway=U6
time                  mean   <---- The relevant field key for the main query
----                  ----
1970-01-01T00:00:00Z  18

Perform additional analysis on the results of another query

Another use for subqueries is to execute a function on the results of a mathematical operation. This use case is equivalent to performing mathematical operations within a function. The following query calculates the number of spilled_coffees per passenger and returns the average of  those quotients:

> SELECT MEAN("spills_per_person") FROM (SELECT "spilled_coffee"/"passengers" AS "spills_per_person" FROM "schedule" GROUP BY "subway")

name: schedule
time                  mean
----                  ----
1970-01-01T00:00:00Z  0.13206144512134227

Notice that the SELECT clause in the main query specifies spills_per_person as the field key in the MEAN function. To understand how InfluxDB calculates those results, we’ve included the (annotated) output from just the subquery below:

> SELECT "spilled_coffee"/"passengers" AS "spills_per_person" FROM "schedule" GROUP BY "subway"
name: schedule
tags: subway=U2
time                  spills_per_person   <---- The relevant field key for the main query
----                  -----------------
2017-01-25T18:00:00Z  0.029411764705882353
2017-01-25T18:05:00Z  0.05555555555555555
2017-01-25T18:10:00Z  0.13333333333333333
2017-01-25T18:15:00Z  0.16129032258064516

name: schedule
tags: subway=U6
time                  spills_per_person   <---- The relevant field key for the main query
----                  -----------------
2017-01-25T18:00:00Z  0.2
2017-01-25T18:05:00Z  0.1111111111111111
2017-01-25T18:10:00Z  0.05
2017-01-25T18:15:00Z  0.3157894736842105

Place specific conditions on the results of another query

Execute a function and return only those results that meet a specific condition. This use case is similar to SQL’s HAVING clauses. The following query calculates the minimum number of passengers at ten-minute intervals and returns only those minimum values that are greater than 15:

> SELECT "min" FROM (SELECT MIN("passengers") FROM "schedule" WHERE time >= '2017-01-25T18:00:00Z' AND time <= '2017-01-25T18:15:00Z' GROUP BY time(10m)) WHERE "min" > 15

name: schedule
time                  min
----                  ---
2017-01-25T18:10:00Z  19

Notice that the SELECT and WHERE clauses in the main query specify min as the field key of interest. To understand how InfluxDB calculates those results, we’ve included the (annotated) output from just the subquery below:

> SELECT MIN("passengers") FROM "schedule" WHERE time >= '2017-01-25T18:00:00Z' AND time <= '2017-01-25T18:15:00Z' GROUP BY time(10m)

name: schedule
time                  min   <---- The relevant field key for the main query
----                  ---
2017-01-25T18:00:00Z  15
2017-01-25T18:10:00Z  19    <---- This is the only minimum value that gets returned

What else is there?

You can find the documentation for subqueries on the Data Exploration page. This is our first implementation of subqueries in InfluxQL - feel free to open an issue on GitHub if you find any unexpected behavior. We’d love to get your feedback!

What's next:

  • Downloads for the TICK-stack are live on our "downloads" page.
  • Deploy on the Cloud: Get started with a FREE trial of InfluxDB Cloud featuring fully-managed clusters, Kapacitor and Grafana.
  • Deploy on Your Servers: Want to run InfluxDB clusters on your servers? Try a FREE 14-day trial of InfluxDB Enterprise featuring an intuitive UI for deploying, monitoring and rebalancing clusters, plus managing backups and restores. 
  • Tell Your Story: Over 300 companies have shared their story on how InfluxDB is helping them succeed. Submit your testimonial and get a limited edition hoodie as a thank you.