4 Unique Time Series Workloads for InfluxDB, Powered by IOx
By Jason Myers / Mar 03, 2023 / InfluxDB IOx
Different time series workloads
Data is kind of like Newton’s first law of motion. Data is just that unless acted upon by something else. Time series data, therefore, is something you derive from data. We generally derive time series data to record historical observations about a physical or virtual system (for example, think of sensors and servers, respectively). However, not all time series data is the same. There are different use cases for time series data, and each has its own workload needs. The way in which a database stores and processes this data varies with the use case.
Let’s be clear — it is very difficult to have one solution that can handle all the different types of time series data and their respective workloads at the same time, but that doesn’t mean it’s impossible. If we go back to the idea of deriving time series from data, when we store those derivations we get metrics. But the other workloads discussed below — raw events, traces, and logs — the underlying data for all three is just… data. The queries we run against that underlying data ultimately determine the workload necessary to generate, work with, and analyze the resulting time series.
With the new InfluxDB IOx database engine, InfluxDB expands the number of time series data use cases it can handle. Let’s take a quick look at some of the key types of observability data and how InfluxDB enables users to get value from them.
This is the area where our previous solution excelled, and it only got better with the new InfluxDB IOx storage engine. Metrics are essentially polling an existing event stream at a fixed interval to gain visibility into a system, process, or other data source. This type of data is the basis for a lot of observability processes. InfluxDB can ingest billions of data points per second and enables real-time analysis of that data.
Raw events are related to metrics, but the new capabilities of InfluxDB powered by IOx mean that users don’t need to simply poll a stream of event data. Instead, now users can simply collect all that raw event data, regardless of how granular it is, and compute metrics on the fly. This is possible because we optimized the new query engine to perform fast analytical queries on this type of data. Again, this is a different workload than simply collecting metrics. But because InfluxDB, powered by IOx, is a columnar database, it uses an in-memory storage tier to cache leading edge, “hot” data, which enables sub-second query responses.
This means that users can look at things like a percentile of data in specific buckets for the past three hours. They can derive these summaries on the fly and scale these types of queries as needed. The key point here is that by having the entire data stream available for querying, rather than metrics only, users can slice and dice their data in any way they want, across different dimensions.
Distributed tracing is a use case that previous versions of InfluxDB could technically do, but often struggled with performance. The reason for this is the cardinality problem. One of the key pieces of information in tracing is the span ID, which fits best as a tag in the InfluxDB data model.
While you may have a finite number of spans in your application, the values for each ID can be anything. Having unbounded tag values creates runaway cardinality, which affects overall performance. So, to enable tracing, we had to solve the cardinality issue, which we did with the InfluxDB IOx engine.
Check out this post for more details on how the new engine addresses cardinality on a technical level. Needless to say, high cardinality creates a very different workload for traces, but InfluxDB IOx has the capabilities to handle tracing workloads without a decline in performance.
While still time series data, logs are a different animal altogether. The InfluxDB IOx engine can handle log data, but it’s not yet optimized for log workloads. Log data bears some similarities to tracing data in that it typically has very high cardinality. The structure of that high cardinality data is different from traces though.
For logs, not only are tag values unbounded, but tag keys are unbounded as well. The InfluxDB IOx engine stores each tag or field value in its own column, which makes it faster to store, compress, access, and analyze that data. This also reduces the total number of columns because tag and field keys tend to be consistent. However, unbounded tag keys increase the number of columns written to disk.
Imagine you’re collecting debugging logs from an application in JSON format. There are any number of parameters that could throw an error. Because there are multiple developers building an application and maintaining a consistent parameter naming scheme is challenging, a lot of variation occurs. Every variation in the parameter name creates a new tag key. The tag values for these key-value pairs can also be very long strings, which increases cardinality even more.
As with anything, the way this plays out in reality will vary with the type of log data you collect, how you parse it, and how consistent the parameters and error name-types are.
The structure of time series data can be very different depending on your data source and what you ultimately want to accomplish. When it comes to observability, monitoring, events, traces, and logs all play critical, and oftentimes interconnected, roles. At the same time, it’s important to consider the different workloads around each data type. Fortunately, we built and optimized InfluxDB, powered by IOx, to maximize the number and type of time series workloads available in a single database. The InfluxDB IOx engine already unlocks use cases that either weren’t possible or very difficult with the previous database engine, and makes them possible and performant.
What are you building with InfluxDB, powered by IOx? Tell us about it and you might even get some free swag!