InfluxDays Recap – Paul Dix and the Journey of InfluxDB
Jason Myers /
Product, Use Cases
Nov 03, 2022
According to the old adage, life’s a journey not a destination. The same can be said for software. It’s unlikely that any developer would ever say that something they built was truly done. There are always bugs to squash, features to add, and updates to implement. As a company intensely focused on time and the context of time, it comes as little surprise that these themes played a significant role in Paul Dix’s presentation for InfluxDays.
Paul identified two specific functions for time series databases. The first is that a time series database is a place to store values and metadata. The second is that a time series database uses raw data to compute on the fly. When InfluxDB first came onto the market the options available for handling time series data enabled the first function, storage of time series data.
In the beginning…
Solutions like RRDtool and Graphite utilized variations on a round robin database structure. However, one of the main assumptions about a round robin file format is that all time series are regular. Because events exist, we know this isn’t, in fact, the case.
The data model for OpenTSDB, another early time series database, didn’t rely on a round robin data structure. The OpenTSDB data model gets closer to what InfluxDB utilizes in line protocol, but it could only handle a handful of metadata tags without affecting performance.
This was the lay of the land in the world of time series databases when InfluxData introduced InfluxDB in 2013. One of the key differences InfluxDB offered was support for both metrics and events. We did this in an attempt to address the two key functions of time series databases – storing data and computing on the fly.
As InfluxDB evolved, we developed the line protocol data model that allowed for fast ingestion and extensive tagging. Ultimately, the early version of InfluxDB had fast lookups and query execution for low cardinality data sets, but higher cardinality data sets were slower and more expensive to compute.
…into the future
This historical review of InfluxDB is important because it contextualizes the future of the platform. The second major iteration of InfluxDB did a pretty good job of addressing the time series database functions, but the cardinality issue persisted as a roadblock for the type of performance and use case support users want from a time series database.
The newest iteration of InfluxDB, powered by IOx, relies on a columnar structure. This allows for more efficient data compression because it uses Apache Parquet for per column compression and encoding. Parquet is just one of many major improvements made to InfluxDB, but taken together these updates effectively eliminate cardinality limits. InfluxDB IOx is truly a time series database that provides both storage and the ability to compute time series on the fly from high precision raw data.
Other additions to the next generation version of InfluxDB include native support for SQL queries and federation at the edge, so that users can work with their high-fidelity data in a way that makes sense of their use case(s). IOx-powered InfluxDB is available in InfluxDB Cloud now.
For more information on IOx, check out the announcement.