InfluxData Blog - Stuart Carnie

Schema Queries in Flux (formerly IFQL)

Stuart Carnie (InfluxData) — Thu, 31 May 2018 11:50:29 -0700

InfluxQL facilitates schema exploration via a number of meta queries, which include SHOW MEASUREMENTS, SHOW TAG KEYS, SHOW TAG VALUES and SHOW FIELD KEYS. Flux (formerly IFQL) has united these concepts, such that schema is made up of tags keys and values. This unification provides greater flexibility with which a user may explore a schema, as we will show in the remainder of this post.

InfluxQL ? Flux (formerly IFQL)

This section demonstrates translations of InfluxQL meta queries to their Flux equivalents.

`SHOW MEASUREMENTS`

Measurement names are aggregated within the _measurement tag key. Therefore, we want to ask Flux to give us the distinct values for _measurement.

from(db:"foo")
  |> range(start:-24h)
  |> group(by:["_measurement"])
  |> distinct(column:"_measurement")
  |> group(none:true)

Typing this all the time may become tedious, so we can write a helper function that queries a specified database for the last 24 hours as follows:

showMeasurements = (db) => from(db:db) 
  |> range(start:-24h)
  |> group(by:["_measurement"])
  |> distinct(column:"_measurement")
  |> group(none:true)

Resulting in a greatly simplified query to show measurements:

showMeasurements(db:"foo")

We plan to formalize a number of helper functions for querying metadata in the near future.

InfluxQL meta queries cannot be restricted by time, so it is worth calling out the use of range to restrict the results to only those measurements with data in the last 24 hours.

`SHOW TAG KEYS FROM cpu`

List all the tag keys for the measurement cpu.

from(db:"foo")
    |> range(start:-24h)
    |> filter(fn:(r) => r._measurement == "cpu")
    |> keys()

`SHOW TAG VALUES FROM cpu WITH KEY = "host"`

List all the distinct values for a specific tag (host) in measurement cpu.

from(db:"foo")
    |> range(start:-24h)
    |> filter(fn:(r) => r._measurement == "cpu")
    |> group(by:["host"])
    |> distinct(column:"host")
    |> group(none:true)

`SHOW FIELD KEYS FROM cpu`

List all the fields for measurement cpu, which are aggregated under the _field tag key.

from(db:"foo")
    |> range(start:-24h)
    |> filter(fn:(r) => r._measurement == "cpu")
    |> group(by:["_field"])
    |> distinct(column:"_field")
    |> group(none:true)

The astute reader will notice this is the same query as 3., with the exception of using the _field tag key.

Exploring Schema

In this section, we walk through a series of queries a user might perform as they explore their schema.

1. Show the available tag keys

This query uses the keys function to list the distinct tag keys.

from(db:"foo")
    |> range(start:-1h)
    |> group(none:true)
    |> keys(except:["_time","_value","_start","_stop"])

Example Output:

_value:string
-------------
_field
_measurement
bank
dir
host
id
interface
tag0
tag1
tag2
tag3

2. Expand the `host` tag to see available values

This query groups the data by the host tag and outputs the distinct values for the host column.

from(db:"foo")
    |> range(start:-1h)
    |> group(by:["host"])
    |> distinct(column:"host")
    |> group(none:true)

Example Output:

_value:string
-------------
host2
host1

3. Expand `host1` tag to see available keys

This query filters by host == "host1" and shows the subset of available keys.

from(db:"foo")
    |> range(start:-1h)
    |> filter(fn:(r) => r.host == "host1")
    |> group(none:true)
    |> keys(except:["_time","_value","_start","_stop", "host"]) // <- note host is added here, since we're already querying it

Example Output:

_value:string
-------------
_field
_measurement
bank
dir
id
interface

InfluxData is Building a Fast Implementation of Apache Arrow in Go Using c2goasm and SIMD

Stuart Carnie (InfluxData) — Thu, 22 Mar 2018 03:36:31 -0700

InfluxData is pleased to announce our contribution to the Apache Arrow project. Essentially, we are contributing work that we already started: the development of a Go implementation of Apache Arrow. We believe in open source and are committed to participating in and contributing to the open source community in meaningful ways. We developed an interest in Apache Arrow for a number of reasons which we describe in more detail below, and contributing our initial efforts to the Apache Software Foundation ensures that the community maintains the focus within that repository.

Apache Arrow specifies a standardized, language-independent, columnar memory format for flat and hierarchical data that is organized for efficient, analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and inter-process communication. As we have been working on developing a new query processing engine and language for InfluxDB, currently known as Flux f.k.a. IFQL, Arrow provides a superior way to both exchange data between the database and the query processing engine while also providing an additional means for InfluxData to participate in a broader ecosystem of data processing and analysis tools.

Why Arrow?

One of many goals for Flux f.k.a. IFQL is to enable new ways to efficiently query and analyze your data using industry-standard tools. One such example is pandas, an open source library that provides advanced features for data analytics and visualization. Another is Apache Spark, a scalable data processing engine. We discovered these and many other open source projects, and commercial software offerings, are adopting Apache Arrow to address the challenge of sharing columnar data efficiently. The Apache Arrow mission statement defines a number of goals that resonated with the team at InfluxData:

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.

Specifically:

Standardized: Many projects in the data science and analytics space are adopting Arrow as it addresses a common set of design problems including how to efficiently exchange large data sets. Examples of early adopters include pandas and Spark, and the list continues to grow.
Performance: The specification is clear that performance is the raison d'être. Arrow data structures are designed to work efficiently on modern processors, enabling the use of features like single-instruction, multiple-data (SIMD).
Language-Independent: Mature libraries exist for C/C++, Python, Java and Javascript with libraries for Ruby and Go in active development. More libraries mean more ways to work with your data.

We also recognize Apache Arrow as an opportunity to participate and contribute to a community that will face similar challenges. A problem shared is a problem halved.

Apache Arrow at InfluxData

We have identified a few areas where InfluxDB will benefit from Apache Arrow:

represent in-memory TSM columnar data,
perform aggregations using SIMD math kernels and
the data communication protocol between InfluxDB and Flux f.k.a. IFQL.

For Flux f.k.a. IFQL:

represent block data structures,
perform aggregations using SIMD math kernels and
the primary communication protocol to clients.

In the future, we expect that a user could create a Jupyter Notebook, execute an Flux f.k.a. IFQL query in Python and manipulate the data efficiently in pandas, with little overhead.

Apache Arrow in Go

At the time of writing, the Go implementation has support for the following features:

Memory Management

Allocations are 64-byte aligned and padded to 8-bytes

Array and Builder Support

Primitive Types

Signed and unsigned 8, 16, 32 and 64 bit integers
32 and 64 bit floats
Packed LSB booleans
Variable-length binary arrays

Parametric Types

Timestamp

Type Metadata

Data types

SIMD Math Kernels

SIMD optimized Sum operations for 64-bit float, int and unsigned int arrays

SIMD Your Go with No Assembly Required, Using This One Weird Trick!

Before we share the magic, let’s delve a little deeper into why SIMD or single-instruction, multiple-data is relevant. It is no accident that most data structures in Apache Arrow occupy continuous blocks of memory as arrays or vectors. Using special instructions, many of today’s CPUs can process tightly packed data like this in parallel, improving the performance of specific algorithms and operations. Even better, compilers are built with a host of advanced optimizations, such as auto vectorization, to take advantage of these features without the developer having to write any assembly. During compilation, the compiler may identify loops that process arrays as candidates for auto vectorization, and generate more efficient machine code utilizing SIMD instructions. Alas, the Go compiler lacks these optimizations, leaving us to fend for ourselves. We could write these routines in assembly, but that is hard enough without having to use Go’s esoteric Plan 9 syntax. To make matters worse, in order to write optimal code in assembly for a specific architecture, you must be familiar with other issues like instruction scheduling, data dependencies, AVX-SSE transition penalties and more.

clang + c2goasm = ??

c2goasm, developed by minio, is an awesome command-line tool that transforms the assembly output of functions written in C/C++ into something the Go Plan 9 assembler will understand. These are not the same as CGO and just as efficient to call as any other Go function. A caveat of routines written in Go assembly is they cannot be inlined, so it is important they do enough work to negate the overhead of the function call. The examples in the release announcement make use of intrinsics, which are compiler extensions that provide access to processor-specific features like wide data types (__m256) and functions that map to processor instructions (_mm256_load_ps). Using intrinsics vs. writing pure assembly allows the developer to mix high-level C code with low-level processor features, whilst still allowing the compiler to perform a limited set of optimizations.

Our first experiment was to take a Go function that summed a slice 64-bit floats and determine if we could improve it with c2goasm. We benchmarked 1,000 element slices, as they match the maximum size of a TSM block in InfluxDB. The benchmarks were collected on an early 2017 MacBook Pro running at 2.9GHz.

The reference implementation in Go ran at 1200 ns/op or 6,664 MB/s:

func SumFloat64(buf []float64) float64 {
	acc := float64(0)
	for i := range buf {
		acc += buf[i]
	}
	return acc
}

Following a similar approach to the c2goasm post, we used AVX2 intrinsics to produce this abridged implementation:

void sum_float64_avx_intrinsics(double buf[], size_t len, double *res) {
    __m256d acc = _mm256_set1_pd(0);
    for (int i = 0; i < len; i += 4) {
        __m256d v = _mm256_load_pd(&buf[i]);
        acc = _mm256_add_pd(acc, v);
    }

    acc = _mm256_hadd_pd(acc, acc); // a[0] = a[0] + a[1], a[2] = a[2] + a[3]
    *res = _mm256_cvtsd_f64(acc) + _mm_cvtsd_f64(_mm256_extractf128_pd(acc, 1));
}

This version summed the 1,000 64-bit floats (double in C) at 255 ns/op or a rate of 31,369 MB/s – 4.7× is a handy improvement. Intel x86 AVX2 intrinsics are a specific set of extensions for working with 256-bits of data or 4×64-bit float values using single instructions. There is a bit going on here, so let's summarize what the code does:

initialize the accumulator acc, a data type representing 4×64-bit float elements, to 0
for each iteration:
load the next 4 64-bit float elements from buf into v
add the corresponding elements of v to acc, i.e. acc[0] += v[0], acc[1] += v[1], acc[2] += v[2] and acc[3] += v[3]
if more elements in buf, increment by 4 and restart loop
sum acc[0]+acc[1]+acc[2]+acc[3]
convert to a double and return the value

It is worth noting that there are a couple of cons to using intrinsics:

the cognitive load required to understand this function is much higher than the Go or plain C version
we'll need a separate implementation using SSE4 intrinsics, as calling this function on a machine that does not support AVX2 extensions will SEGFAULT

There are situations where using intrinsics or writing assembly is the best option, but for a simple loop like this, we decided to explore an alternative. Earlier, we mentioned auto-vectorization, so let's see what an optimizing compiler can do with a plain C version, similar to Go:

void sum_float64(double buf[], int len, double *res) {
    double acc = 0.0;
    for(int i = 0; i < len; i++) {
        acc += buf[i];
    }
    *res = acc;
}

1,000 floats summed in just 58ns or a rate of 137 GB/s¹. Not too shabby, when all we did was specify a few compiler flags to enable optimizations, including loop vectorization, loop unrolling and to generate AVX2 instructions. By writing a portable C/C++ version, we can generate an SSE4 version or target a completely different architecture like ARM64, with only minor alterations to the compiler flags; a benefit that cannot be overstated. ¹According to the specs for the Intel Core i7 6920HQ, it has a maximum memory bandwidth of 34.1 GB/s. 137 GB/s is well above this number, so what is going on? Caching. We can attribute the blazing speed to the data residing in one of the processor caches. Therefore, the sooner you operate on data read from main memory, the more likely you will benefit from caching.

Automating the Code Generation

There are a few steps required to go from the C source to the final Go assembly:

execute clang with the correct compiler flags to generate the base assembly, producing foo_ARCH.s
execute c2goasm to transform foo_ARCH.s into Go assembly
repeat 1 and 2 for each target architecture (e.g. SSE4, AVX2 or ARM64)

If "A" changes, build "B"; if "B" changes, build "C". Sounds like a job for make and that is exactly what we did. Any time we update the C source, we simply run make generate to update the dependent files. We also check the generated assembly files in to the repository to ensure the Go package is go gettable.

Using These Optimizations in Go

If the AVX2 version of the function is called on a processor that does not support these extensions, your program will crash, which isn't ideal. The solution to this is to determine what processor features are available at runtime and call the appropriate function, falling back to the pure Go version, if necessary. The Go runtime does this in a number of places, using the internal/cpu package and we took a similar approach with some improvements. At startup, the most efficient functions are selected based on available processor features. However, if an environment variable named INTEL_DISABLE_EXT is present, disable any of the specified optimizations. If this is of interest to you, we've documented the feature in the repository. For example, to disable AVX2 and use the next best set of features for a hypothetical application, myapp:

$ INTEL_DISABLE_EXT=AVX2 myapp

Conclusion

There is still plenty of work to be done to reach feature parity with the C++ implementation of Apache Arrow and we look forward to sharing our future contributions.

Logging Improvements for InfluxDB 1.5.0

Stuart Carnie (InfluxData) — Tue, 06 Mar 2018 18:18:27 -0700

When it comes to troubleshooting issues, log files are a high-valued asset. If things go wrong, you’ll almost always be asked to “send the logs”. InfluxDB 1.5 comes with a number of improvements to logging in an effort to simplify the task of analyzing this data.

Logging

InfluxDB generates a lot of log output that chronicles many aspects of its internal operation. This latest update has revamped how the database logs key information and the format of the log output. The primary goal of these changes is to enable tooling which can efficiently parse and analyze the log data and reduce the time required to diagnose issues.

Structured Logging

InfluxDB 1.5 can generate structured logs in either logfmt or JSON lines. This feature is found in a new section of the configuration file titled [logging]. The default log format is auto, which determines the output based on whether stderr refers to a terminal (TTY) or not. If stderr is a TTY, a less verbose “console” format is selected; otherwise, the output will be logfmt. If you would prefer consistent output, we would encourage you to explicitly set the format to logfmt.

The value of structured logging is greatly improved when specific elements of a log event can be extracted easily. With that in mind, the existing logging code was reviewed to ensure notable data was moved to separate keys. The following example shows a few startup events related to opening files in version 1.3. Note some entries include the path and duration embedded in the message:

[I] 2018-03-03T00:46:48Z /Users/stuartcarnie/.influxdb/data/db/autogen/510/000000001-000000001.tsm (#0) opened in 280.401827ms engine=tsm1 service=filestore
[I] 2018-03-03T00:46:48Z reading file /Users/stuartcarnie/.influxdb/wal/db/autogen/510/_00001.wal, size 15276 engine=tsm1 service=cacheloader
[I] 2018-03-03T00:46:54Z /Users/stuartcarnie/.influxdb/data/db/autogen/510 opened in 5.709152422s service=store

As of 1.5, the same events using logfmt now look like:

ts=2018-03-03T00:48:57.246371Z lvl=info msg="Opened file" log_id=06bLAgTG000 engine=tsm1 service=filestore path=/Users/stuartcarnie/.influxdb/data/db/autogen/510/000000001-000000001.tsm id=0 duration=61.736ms
ts=2018-03-03T00:48:57.246590Z lvl=info msg="Reading file" log_id=06bLAgTG000 engine=tsm1 service=cacheloader path=/Users/stuartcarnie/.influxdb/wal/db/autogen/510/_00001.wal size=15276
ts=2018-03-03T00:49:01.023313Z lvl=info msg="Opened shard" log_id=06bLAgTG000 service=store trace_id=06bLAgv0000 op_name=tsdb_open path=/Users/stuartcarnie/.influxdb/data/db/autogen/510 duration=3841.275ms

Take note that path and duration are now separate keys. Using a tool like lcut, we can select specific keys (ts, msg, path and duration) to reformat the output:

2018-03-03T00:48:57.246371Z  Opened   file   /Users/stuartcarnie/.influxdb/data/db/autogen/510/000000001-000000001.tsm  61.736ms
2018-03-03T00:48:57.246590Z  Reading  file   /Users/stuartcarnie/.influxdb/wal/db/autogen/510/_00001.wal
2018-03-03T00:49:01.023313Z  Opened   shard  /Users/stuartcarnie/.influxdb/data/db/autogen/510                          3841.275ms

Collating Log Events

During the lifetime of an InfluxDB process, operations such as compactions run continuously and generate multiple events as they advance. To further complicate matters, when these operations run concurrently, the events from each will be interleaved together. Determining the outcome of a specific compaction is practically impossible, as there is no way to associate log events for each independently. To address this issue, InfluxDB 1.5 adds the keys listed in the following table:

key	comment
`trace_id`	A unique value associated with each log event for a single run of an operation
`op_name`	A searchable identifier, such as `tsm1.compact_group` assigned to each operation type
`op_event`	Given a value of `start` or `end` to indicate whether this is the first or last event of the operation
`op_elapsed`	The time an operation took to complete, in milliseconds. This key is always included with the `op_event=end` log event

To demonstrate how these keys might be used, we’ll use the following (abridged) file named “influxd.log”, which includes at least two compactions running concurrently.

msg="TSM compaction (start)" trace_id=06avQESl000 op_name=tsm1_compact_group op_event=start
msg="Beginning compaction" trace_id=06avQESl000 op_name=tsm1_compact_group tsm1_files_n=2
msg="Compacting file" trace_id=06avQESl000 op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/influxdb/data/db1/rp/10/000000859-000000002.tsm
msg="Compacting file" trace_id=06avQESl000 op_name=tsm1_compact_group tsm1_index=1 tsm1_file=/influxdb/data/db1/rp/10/000000861-000000001.tsm
msg="TSM compaction (start)" trace_id=06avQEZW000 op_name=tsm1_compact_group op_event=start
msg="Beginning compaction" trace_id=06avQEZW000 op_name=tsm1_compact_group tsm1_files_n=2
msg="invalid subscription token" service=subscriber
msg="Post http://kapacitor-rw:9092/write?consistency=&db=_internal&precision=ns&rp=monitor: dial tcp: lookup kapacitor-rw on 10.0.0.1: server misbehaving" service=subscriber
msg="Compacting file" trace_id=06avQEZW000 op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/influxdb/data/db2/rp/12/000000027-000000002.tsm
msg="Compacting file" trace_id=06avQEZW000 op_name=tsm1_compact_group tsm1_index=1 tsm1_file=/influxdb/data/db2/rp/12/000000029-000000001.tsm
msg="Compacted file" trace_id=06avQEZW000 op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/influxdb/data/db2/rp/12/000000029-000000002.tsm.tmp
msg="Finished compacting files" trace_id=06avQEZW000 op_name=tsm1_compact_group tsm1_files_n=1
msg="TSM compaction (end)" trace_id=06avQEZW000 op_name=tsm1_compact_group op_event=end op_elapsed=56.907ms
msg="Compacted file" trace_id=06avQESl000 op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/influxdb/data/db1/rp/10/000000861-000000002.tsm.tmp
msg="Finished compacting files" trace_id=06avQESl000 op_name=tsm1_compact_group tsm1_files_n=1
msg="TSM compaction (end)" trace_id=06avQESl000 op_name=tsm1_compact_group op_event=end op_elapsed=157.739ms

Compactions are identified by op_name=tsm1_compact_group, so to summarize them, we might use the following command to output the trace id and elapsed time:

$ fgrep 'tsm1_compact_group' influxd.log | fgrep 'op_event=end' | lcut trace_id op_elapsed

which can be read as:

Find all the lines containing the text tsm1_compact_group and op_event=end and display the trace_id and op_elapsed keys

and would produce the following output:

06avQEZW000	56.907ms
06avQESl000	157.739ms

From here it is easy to filter the logs for trace 06avQESl000 using the following:

$ fgrep '06avQESl000' influxd.log | lcut msg tsm1_file
TSM compaction (start)
Beginning compaction
Compacting file	/influxdb/data/db1/rp/10/000000859-000000002.tsm
Compacting file	/influxdb/data/db1/rp/10/000000861-000000001.tsm
Compacted file	/influxdb/data/db1/rp/10/000000861-000000002.tsm.tmp
Finished compacting files
TSM compaction (end)

We can ask more complex questions, such as:

What are the top 10 slowest continuous queries?

using a command like:

$ fgrep 'continuous_querier_execute' influxd.log | fgrep 'op_event=end' | lcut trace_id op_elapsed | sort -r -h -k2 | head -10
06eXrSJG000	15007.940ms
06d7Ow3W000	15007.646ms
06axkRVG000	15007.222ms
06ay9170000	15007.118ms
06c9tbwG000	15006.701ms
06dUcXhG000	15006.533ms
06ekMi40000	15006.158ms
06c5FH7l000	15006.145ms
06bDHhkG000	15006.012ms
06a~ioYG000	15005.988ms

In summary, structured logging will enable us to analyze log data more efficiently with off-the-shelf tooling.

HTTP Access Logs

InfluxDB has long supported the ability to output HTTP request traffic in Common Log Format. This feature is enabled by setting the log-enabled option to true in the [http] section of the InfluxDB configuration.

Prior to 1.5, all log output was sent to stderr and looked something like the following:

[I] 2018-03-02T19:59:58Z compacted level 1 4 files into 1 files in 130.832391ms engine=tsm1
[I] 2018-03-02T20:00:09Z SELECT count(v0) FROM db.autogen.m0 WHERE time >= '2018-02-27T01:00:00Z' AND time < '2018-02-27T02:00:00Z' GROUP BY * service=query
[httpd] ::1 - - [02/Mar/2018:13:00:09 -0700] "GET /query?db=db&q=select+count%28v0%29+from+m0+where+time+%3E%3D+%272018-02-27T01%3A00%3A00Z%27+and+time+%3C+%272018-02-27T02%3A00%3A00Z%27+group+by+%2A HTTP/1.1" 200 0 "-" "curl/7.54.0" 4f39378e-1e54-11e8-8001-000000000000 726
[I] 2018-03-02T20:00:57Z retention policy shard deletion check commencing service=retention

Intermingling the log event streams required additional work to separate them before we can begin any analysis. The latest release adds an additional configuration option to write the access log to a separate file. For example, the following configuration will write the access log to a file located at /var/log/influxd/access.log

[http]
  # ...
  log-enabled = true
  access-log-path = "/var/log/influxd/access.log"
  # ...

When the access log is written to a file, the [http] prefix is stripped such that the file can be parsed by standard HTTP log analysis and monitoring tools without further processing. For example, using lnav, an admin can open an active log file and display the data using a number of different visualizations, including error rates and histograms as is demonstrated in the following asciinema.

InfluxData Blog - Stuart Carnie

Schema Queries in Flux (formerly IFQL)

InfluxQL ? Flux (formerly IFQL)

SHOW MEASUREMENTS

SHOW TAG KEYS FROM cpu

SHOW TAG VALUES FROM cpu WITH KEY = "host"

SHOW FIELD KEYS FROM cpu

Exploring Schema

1. Show the available tag keys

2. Expand the host tag to see available values

3. Expand host1 tag to see available keys

InfluxData is Building a Fast Implementation of Apache Arrow in Go Using c2goasm and SIMD

Why Arrow?

Apache Arrow at InfluxData

Apache Arrow in Go

Memory Management

Array and Builder Support

Type Metadata

SIMD Math Kernels

SIMD Your Go with No Assembly Required, Using This One Weird Trick!

clang + c2goasm = ??

Automating the Code Generation

Using These Optimizations in Go

Conclusion

Logging Improvements for InfluxDB 1.5.0

Logging

Structured Logging

Collating Log Events

HTTP Access Logs

`SHOW MEASUREMENTS`

`SHOW TAG KEYS FROM cpu`

`SHOW TAG VALUES FROM cpu WITH KEY = "host"`

`SHOW FIELD KEYS FROM cpu`

2. Expand the `host` tag to see available values

3. Expand `host1` tag to see available keys