How to Pivot Your Data in Flux: Working with Columnar Data

Navigate to:


Relational databases are by far the most common type of database, and as software developers it’s safe to say that they are the kind of database most of us got started on, and probably still use on a regular basis. And one thing that they all have in common is the way they structure data. InfluxDB, however, structures data a little bit differently.

RDBMS default table structure

With a relational database you have a collection of tables, each with its own fixed layout of columns, and you read and write data in rows. Each row has a column that holds a value for each field, and you get all of those values at the same time, giving you a table that looks like this:

time field1 field2 host
<data_tstamp> 123 789 myhost1
<data_tstamp> 456 101 myhost2

InfluxDB result data

InfluxDB doesn’t follow this pattern. It returns data in what’s known as a columnar format, which just means that you will get the data one column at a time, rather than one row at a time. This is often a source of confusion for developers who are used to the row format they worked with from a relational database, but it has many benefits when working with time series data.

When you run a basic Flux query like this:

from(bucket: "telegraf")
  |> range(start: -1d)

You will get data returned like this:

table _measurement* _field* _value _start* _stop* _time host*
0 mydata field1 123 <query_start> <query_stop> <data_tstamp> myhost1
0 mydata field1 456 <query_start> <query_stop> <data_tstamp> myhost2
1 mydata field2 789 <query_start> <query_stop> <data_tstamp> myhost1
1 mydata field2 101 <query_start> <query_stop> <data_tstamp> myhost2

The first thing you’ll notice is that the table’s column headers are NOT your field names like they would be if you were querying a SQL database. Instead your query will return your results with these same columns, and each record (represented by r in the query below) in the result will represent a single field’s value at that point in time.

This is why when you filter your Flux query, it looks like this:

|> filter(fn: (r) => r["_measurement"] == "mydata")
|> filter(fn: (r) => r["_field"] == "field1")

This data structure allows for faster queries and aggregations in the most common usage patterns for InfluxDB. Because InfluxDB is processing one column’s values at a time, it can perform calculations on the data for just that field instead of performing multiple calculations at the same time.

Pivoting between column-like and row-like data layouts

There are times, however, when you are going to want your data to look more like the results of a traditional database query. To do that you need to perform a “pivot” operation. Pivoting data is the act of flipping it from having the fields in separate rows to having the fields in separate columns.

Flux provides a pivot function that does precisely this:

|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

This function will flip your data so that every row represents a moment in time (rowKey:["_time"]), your fields will each become a column (columnKey: ["_field"]), and the value in each column will be your field’s value (valueColumn: "_value"). Giving you a result that looks like this:

table _measurement* _start* _stop* _time host* field1 field2
0 mydata <query_start> <query_stop> <data_tstamp> myhost1 123 789
0 mydata <query_start> <query_stop> <data_tstamp> myhost2 456 101

Using fieldsAsCols instead

This is the most common way people want to pivot their data, because it puts it into the same shape as they’re used to working with from a traditional database. It’s so common, in fact, that we provide a shortcut function in the schema package just for that use case.

After importing schema into your Flux script, the above line of code can be replaced with:.

Flux
  |> fieldsAsCols()

Other ways to pivot

Tags as columns

Even though it’s the most common way to pivot, fields are not the only thing you can flip up into column headers. Suppose you want to compare values not against different fields, but against the same field from different sources. For example, if you want to compare readings from different server hosts taken at the same time.

In our previous example, our data had a tag called host that contained this value.

table _measurement* _field* _value _start* _stop* _time host*
0 mydata field1 123 <query_start> <query_stop> <data_tstamp> myhost1
0 mydata field1 456 <query_start> <query_stop> <data_tstamp> myhost2
1 mydata field2 789 <query_start> <query_stop> <data_tstamp> myhost1
1 mydata field2 101 <query_start> <query_stop> <data_tstamp> myhost2

If we wanted to combine the values of each field from all of our hosts into a single record, we could do that with:

|> group()
  |> pivot(rowKey:["_time", "_field"], columnKey: ["host"], valueColumn: "_value")

The first line re-groups our data because InfluxDB by default will put values from different hosts (or any difference in tag values) into separate result tables. Since pivot works on one result table at a time, we have to call group first to put the values from both hosts into a single table, then we can pivot that combined table around the host values.

This query gives us a table structure that looks like this:

table _field _start _stop _time myhost1 myhost2
0 field1 <query_start> <query_stop> <data_tstamp> 123 456
0 field2 <query_start> <query_stop> <data_tstamp> 789 101

Combined columns

You can also use pivot to combine field names with tag values. Building off the above example, suppose you want to see the values of both field1 and field2 for each host in a single row?

We can do that with a slight modification to our previous function call:

|> group()
  |> pivot(rowKey:["_time"], columnKey: ["host", "_field"], valueColumn: "_value")

By moving the _field data from the rowKey to the columnKey, along with the host data, we get a table that combines both into unique columns.

table _start _stop _time myhost1_field1 myhost1_field2 myhost2_field1 myhost2_field2
0 <query_start> <query_stop> <data_tstamp> 123 789 456 101

Further reading