InfluxData Blog - Paul Dix

What’s New in InfluxDB 3.2: Explorer UI Now GA Plus Key Enhancements

Paul Dix (InfluxData) — Mon, 30 Jun 2025 01:00:00 +0000

InfluxDB 3.2 is now available for both Core and Enterprise, bringing the general availability of InfluxDB 3 Explorer, a new UI that simplifies how you query, explore, and visualize data. On top of that, 3.2 includes a wide range of performance improvements, feature updates, and bug fixes.

InfluxDB 3 Core is free and open source, optimized for recent data, and licensed under MIT and Apache 2. InfluxDB 3 Enterprise builds on that foundation with support for longer-term historical queries, high availability, enhanced security, and multi-node deployments.

With 3.2, we’re focused on helping developers handle real-world problems faster, with less overhead, and a better developer experience from the start.

InfluxDB 3 Explorer: a new UI for InfluxDB 3

Explorer is a new user interface for working with Core and Enterprise. It brings everything into one place, from getting data in to querying, visualizing, and managing your database. It’s designed to remove friction: fewer tools, less context switching, and faster feedback.

Getting Started with InfluxDB 3

When you first launch Explorer, you’ll get an optional onboarding guide that adjusts to your experience level. If you’re new to InfluxDB, it does a full walkthrough. Experienced users can skip straight to the editor.

Explorer helps you learn how to write data into InfluxDB 3. From there, you can get started immediately—use a built-in sample dataset, import your own CSV or JSON, or generate a connection string to write from code or Telegraf.

Configuration & Management

Explorer provides point-and-click access to common admin tasks. You can create and drop databases, manage tokens, or auto-generate the server startup command.

All of this is optional if you prefer the CLI, but it’s nice to have it in the UI when you need it.

Querying & Visualizations

Explorer makes it easy to explore your schema and start querying right away. Write SQL with autocomplete, view results as tables or charts, and refine your query as you go. When you’re ready to use it in your app, just click View Code to copy a runnable version.

Built-In Integrations

Explorer includes integrations that make working with time series smoother:

OpenAI + Natural Language Search: Add your OpenAI token and run SQL queries using everyday language. Explorer passes your schema for context and builds a runnable query.
Grafana Export: You can also export queries straight to a Grafana dashboard with a few clicks. Perfect for users already visualizing InfluxDB data with Grafana.

Run It Your Way

Explorer runs as a standalone Docker container that you can deploy alongside any InfluxDB 3 instance. Run it in Admin mode to get full functionality with InfluxDB 3 Core or Enterprise. Alternatively, use Query mode to give users read-only access, which is ideal for connecting to InfluxDB 3 Cloud Serverless, Cloud Dedicated, or Clustered deployments. You control the access level depending on your use case.

Smarter data retention: now per table in Enterprise

InfluxDB 3.2 adds new flexible and granular data retention policies, helping developers manage storage costs and data lifecycle requirements with greater precision:

In Core, you can now configure retention policies per database.
In Enterprise, you can go even further with per-table retention policies, allowing you to define different data lifecycles for different workloads within the same database.

This is a big step for teams that require tighter control over data retention and costs without creating multiple databases.

Soft deletes are also now supported across Core and Enterprise, with a default 72-hour grace period before permanent removal. See the release notes for the full list of updates: Core release notes, Enterprise release notes.

Getting started

Download InfluxDB 3.2 or pull the latest Docker image for Core and Enterprise.

Check out the docs for details, and let us know what you think in Discord or Community Slack. Your feedback helps shape where we go next.

Coming soon: Our 3.3 release is coming up next month with more improvements and functionality, including new managed plugins for the Processing Engine, making it easier to address common time series tasks just by installing a plugin.

InfluxDB 3 Core & Enterprise GA: The Next Generation Time Series Platform for Developers is Here

Paul Dix (InfluxData) — Tue, 15 Apr 2025 05:00:00 +0000

After months of development, testing, and community feedback, we’re excited to announce the general availability (GA) release of InfluxDB 3 Core and InfluxDB 3 Enterprise. This release brings us closer to our vision for InfluxDB: a time series database that helps developers solve the problem of collecting, analyzing, monitoring, and acting on data across sensors, networks, servers, and applications. We view time series as a way to analyze, monitor, and act on data through time. This can take the form of pre-computed time series stored in a database, which previous versions of InfluxDB excel at, and computing time series on the fly from raw event and observational data of all kinds. InfluxDB 3 brings us closer to realizing this broader vision.

This marks the culmination of our four-and-a-half-year journey rebuilding InfluxDB from the ground up for modern time series data workloads. We undertook this significant effort to deliver three key new capabilities that have long been requested by our users and customers: infinite cardinality in their time series data, support for keeping historical data on object storage while making it available for real-time queries, and a fully featured, standards-compliant SQL query engine.

When we evaluated the solution space for bringing these new capabilities to InfluxDB, we realized this would require a significant architectural shift. The result is here today with InfluxDB 3, a modern time series platform supporting infinite cardinality, designed with a “diskless” architecture, an embedded Python VM for collecting, processing, alerting, and automating on data in real-time, and the most operationally simple clustered InfluxDB architecture we’ve ever created.

Both InfluxDB 3 Core and Enterprise are based on that same architecture and purpose-built for the future of time series data:

InfluxDB 3 Core is our new open source product, a high-speed, recent data engine that’s permissively licensed under MIT/Apache 2. It runs as a single process that’s easy to set up and start using right away.
InfluxDB 3 Enterprise is a commercial version of Core with performance optimizations for queries over time ranges longer than an hour, as well as support for longer-term historical queries, high availability, enhanced security, and multi-node deployments.

Since launching our alpha in January and beta in March, we’ve focused on stability, performance optimization, and incorporating user feedback.

If you’re interested in downloading and using the software right away, we’ve made it easy to get started. Whether you want to explore the open source version or kick off a 30-day trial of Enterprise, there’s no need for setup calls or sales chats—just the software, on your terms.

Getting started guides for Core and Enterprise
Open source repo for InfluxDB 3 Core

Technical architecture: the FDAP foundation

InfluxDB 3 is built on what we call the FDAP stack: Apache Flight, DataFusion, Arrow, and Parquet. Paired with the ability to run with all persisted state on object storage (or local disk if preferred), this enables new features and advantages over previous versions of InfluxDB. Apache Arrow, an in-memory columnar format, enables data sharing across system components and fast vectorized query execution. DataFusion is a fully-featured vectorized SQL query engine that takes full advantage of this format. By contributing to DataFusion, we stay close to the engine and the community driving it forward. Every performance gain and new feature flows straight into InfluxDB 3, accelerating its maturity with each release. Parquet serves as our persistence format, which gives great compression of our time series data. Paired with DataFusion’s native support for Parquet and push-down optimizations, we achieve exceptional performance on time series analytic queries that were out of reach of previous versions of InfluxDB.

Flight and FlightSQL give InfluxDB 3 the ability to serve query responses with millions of rows without expensive serialization. We also have an easy-to-use HTTP API for requests that don’t require the performance of FlightSQL.

Developing InfluxDB on the FDAP stack gives us a flexible, efficient architecture that delivers significant performance gains—higher ingest, better compression, faster queries, and no limits on cardinality. It provides efficient storage and real-time analytics, even as your data grows in volume and complexity.

For developers, it means building on a system that’s fast, reliable, and purpose-built for time series. Operationally, it handles high-concurrency workloads, maintains data integrity, and stays resilient under pressure. Architecturally, it scales both out and up, adapting seamlessly as workloads change.

That foundation is possible because we’re not just using open standards like Arrow, Parquet, and DataFusion—we’re helping shape them. Alongside InfluxDB 3 Core, we’re investing in the FDAP stack and contributing upstream, often leading the work. Those improvements flow directly into InfluxDB 3, delivering better features, easier integration, and software tested across a wide range of environments. That’s the value of open source: maturity and resilience you don’t get from closed systems.

InfluxDB 3 Core architecture

InfluxDB 3 Core gives developers a high-speed, recent-data engine optimized for real-time workloads. It’s designed to collect, process, and persist data to local disk or object storage while serving queries against recent data directly from RAM. This makes it ideal for real-time system monitoring, edge data transformation, streaming analytics, and sensor-based alerting—any use case where low-latency insights are critical.

Diskless Architecture

One of the most significant architectural shifts in InfluxDB 3 is the ability to operate “diskless” – using object storage (S3, GCS, Azure Blob, etc.) as the only persistence layer. While we maintain the ability to operate solely with a local disk and no object store, this architecture enables a number of features out of the box. Specifically:

Multi-AZ durability without complex replication
Separation of compute and storage
Stateless operation for easier deployment and scaling in Enterprise
Zero data migration when upgrading or moving instances
Fault tolerance and isolation in multi-node Enterprise deployments

All of this is paired with extensive buffering and caching to ensure that queries against recent or hot data can be served entirely from RAM without a single request to the object store. This makes it possible to deliver query response times measured in tens of milliseconds.

Improved Query Performance with the Last Value Cache and Distinct Value Cache

We’ve introduced several features in InfluxDB 3 Core (and Enterprise) to optimize common query patterns in time series data. The Last Value Cache (LVC) and the Distinct Value Cache (DVC) are built to answer queries in under 10 milliseconds, enabling responsive UIs and fast monitoring systems.

Last Value Cache: Configurable to keep the N most recent values for specified tag hierarchies. For example, you can instantly retrieve the latest metrics for a specific sensor, all sensors on a machine, or all sensors in a factory.
Distinct Value Cache: Ultra-fast lookups for tag values and series metadata, making UI dropdowns and exploratory queries responsive.

Processing Engine: an embedded Python VM

One of the most significant additions in InfluxDB 3 is the embedded Python Processing Engine. It brings computation directly into the database, allowing data transformation, enrichment, monitoring, and alerting without external services or pipelines. The Processing Engine runs a lightweight Python VM inside the database and executes your code based on triggers—on ingest, on a schedule, or in response to an HTTP request.

This isn’t just a new feature—it simplifies the entire stack. You define the logic, and the database runs it, with no extra infrastructure or glue code—just real-time processing where your data already lives.

The Processing Engine enables:

Real-time data transformation and enrichment
Data collection and pulling data into the database
Custom monitoring and alerting logic
Data downsampling and aggregations
Integration with external systems
HTTP endpoints for custom API creation
Running LLM-generated Python scripts for simple automation

Execution of these plugins is based on triggers set by the user and operator. The trigger types are:

WAL Flush: process batches of data on write
Schedule: run on a configurable timer (from milliseconds to hours or days)
Request: bind to an HTTP endpoint for on-demand processing

You don’t need to learn a new DSL or deploy extra services. You write Python and focus on your logic. The Processing Engine handles the rest.

Built-in Plugin API

InfluxDB 3 Python plugins have a built-in API for interacting with the database. This includes the ability to write data back into the database, query the database using the full SQL engine, and an in-memory cache for keeping state across separate trigger calls.

We’ve set up a GitHub repository with InfluxDB 3 Plugin examples and will take pull requests from the community for plugins that might be useful to a broader audience. We’re looking forward to seeing what the community does with the engine and we intend to iterate quickly on its API and capabilities over the coming months.

InfluxDB 3 Enterprise

InfluxDB 3 Enterprise is our commercial offering built for high-scale production workloads. It builds on the Core engine and adds the features needed to run reliably at scale—whether supporting thousands of sensors, running long-term analytics, or powering critical infrastructure.

Enterprise includes:

A compactor that optimizes data layout for queries against more than an hour of data and historical queries
Security features enabling read and write access tokens for specific databases
High availability with muti-node deployments
Read replicas to scale out query and processing workloads
Workload isolation with the ability to separate and scale independently ingest, query, processing, and compaction

Enterprise is designed for operational simplicity—whether you’re running on bare metal, VMs, containers, or Kubernetes. Its diskless architecture isolates workloads and shares only object store files, making it easy to integrate into existing infrastructure and deploy at scale.

Free Tier for Home Users

For non-commercial at home or hobbyist use, we have a free Enterprise tier. It is limited to a single-node deployment with two cores. It includes the compaction engine, enabling better performance and historical time series queries.

We built this tier to give hobbyists access to a complete, production-grade time series database they can use for learning, experimentation, or personal projects. It’s a simple way to explore what’s possible with InfluxDB 3.

Migration and Compatibility

We’ve worked hard to build compatibility with as many of the previous versions of InfluxDB as possible. Specifically, we support:

InfluxDB 1.x and 2.x write APIs: We support the previous write APIs and InfluxDB Line Protocol
InfluxQL and the 1.x query API: We’ve implemented InfluxQL on top of the DataFusion query engine to enable users to query via legacy InfluxQL and a new v3 InfluxQL endpoint in addition to DataFusion’s native SQL

Schema Considerations

While InfluxDB 3 maintains the familiar measurement, tag/field data model mapped onto a more traditional SQL table model, there are some important schema considerations:

Tags and fields must have unique names within a measurement
Database limits: 5 in Core and 100 in Enterprise (with the ability to increase in configuration)
Table limits: 2,000 in Core and 10,000 in Enterprise (with the ability to increase in configuration)

Migration Paths

We’re developing two primary migration approaches:

For Enterprise Users: We’re building comprehensive data migration tools for InfluxDB Enterprise that will preserve your historical data while transitioning to the new architecture. These tools will be released in the coming months.
For Core Users: Since InfluxDB 3 Core is designed specifically for recent data (72 hours), our recommendation for migration is to mirror writes from older versions to a new InfluxDB 3 Core instance for a transition period, then switch over entirely after 72 hours.

For users of Flux, we unfortunately aren’t providing a direct compatibility layer at this time. However, the combination of the Python Processing Engine, SQL, and InfluxQL should provide equivalent or improved functionality for most use cases. The plugin system is the natural successor to functionality from earlier versions, including Continuous Queries, Tasks, Kapacitor, and Telegraf. While Kapacitor and Telegraf remain compatible with InfluxDB 3, the plugin model brings that functionality directly into the database.

What’s next

The general availability of InfluxDB 3 Core and Enterprise is just the first step. Together, they deliver a solid foundation for faster analytics, deeper integrations, and smarter automation. Whether you’re just getting started or already running large-scale workloads, the new engine gives the performance and flexibility to power real-time monitoring, long-term analytics, and event-driven automation at scale. And we’re just getting started—more features, plugins, and tools to build with are on the way.

If you’re new to InfluxDB, there’s never been a better time to get started. Check out our getting started guide for Core and Enterprise, and join the conversation to share feedback and help shape what’s next:

Discord: Join #influxdb3_core on the InfluxDB Discord for direct interaction with our development team
Community Site
Reddit: r/InfluxDB
Slack: #influxdb3_core channel

Preventing Alert Storms with InfluxDB 3's Processing Engine Cache

Paul Dix (InfluxData) — Wed, 26 Mar 2025 07:00:00 +0000

A common problem in monitoring and alerting systems is not just alerting on what you’re seeing but preventing alert storms from overwhelming operators. When a system generates multiple notifications for the same incident, it leads to alert fatigue and can mask other important issues. For time series data, alert fatigue can result in missed anomalies, delayed responses to critical trends, and difficulty distinguishing real performance degradations from noise.

InfluxDB 3’s Processing Engine provides a solution to alert storms with its in-memory cache feature. This post demonstrates how to build a simple alert de-duplication system that prevents unnecessary additional notifications while delivering all alerts on important events.

The Processing Engine and in-memory cache

InfluxDB 3’s Processing Engine is an embedded Python environment that allows you to run code directly in your database, enabling real-time transformation, analytics, and responses to data as it arrives. One of its most powerful features is the in-memory cache, which enables plugins to:

Maintain state between executions
Share data across different plugins
Set expiration times for cached data
Operate in isolated or global namespaces

This stateful processing capability opens up new possibilities for designing intelligent monitoring systems directly within your database. Creating a more intelligent monitoring system reduces noise (aka alert storms). Without those distractions, you can focus on critical issues, streamlining incident response, and maintaining system performance.

Building an alert de-duplication plugin

Let’s build a plugin that demonstrates how to use the cache to prevent alert storms. The basic idea is simple:

When a metric exceeds a threshold, generate an alert
Store the alert time in the cache
Implement a cooldown period during which duplicate alerts are suppressed

Here’s the complete code for our alert de-duplication plugin:

def process_writes(influxdb3_local, table_batches, args=None):
    """

    Process incoming metrics data and generate alerts with 
de-duplication 
    to prevent alert storms.

    This plugin:
    1. Monitors incoming metrics for threshold violations
    2. Uses the in-memory cache to track alert states
    3. Implements cooldown periods to prevent alert storms
    4. Writes alert events to an 'alerts' table
    """
    # Get configuration from trigger arguments or use defaults
    threshold = float(args.get("threshold", "90"))
    cooldown_seconds = int(args.get("cooldown_seconds", "300"))  
# 5 minutes default
    metric_table = args.get("metric_table", "cpu_metrics")
    metric_field = args.get("metric_field", "usage_percent")
    alert_type = args.get("alert_type", "high_value")

    for table_batch in table_batches:
        table_name = table_batch["table_name"]

        # Check if this table matches our configured metric table
        if table_name != metric_table:
            continue

        for row in table_batch["rows"]:
            # Check if we have the necessary fields
            if "host" not in row["tags"] or metric_field not in 
   row["fields"]:   
                continue

            host = row["tags"]["host"]
            value = row["fields"][metric_field]
            timestamp = row["timestamp"]

            # Check if the metric exceeds our threshold
            if value > threshold:
                # Construct a unique alert ID
                alert_id = f"{host}:{alert_type}"                `

                # Check if we're in a cooldown period for this 
alert
                last_alert_time = 
influxdb3_local.cache.get(alert_id)
                current_time = timestamp / 1_000_000_000  # 
Convert ns to seconds

                if last_alert_time is None or (current_time - 
last_alert_time > cooldown_seconds):
                    # We're not in a cooldown period, so generate 
a new alert
                    influxdb3_local.info(f"{alert_type} alert for 
{host}: {value} (threshold: {threshold})")                  

                    # Store the alert time in cache
                    influxdb3_local.cache.put(alert_id, 
current_time)

                    # Create an alert record
                    line = LineBuilder("alerts")
                    line.tag("host", host)
                    line.tag("alert_type", alert_type)
                    line.tag("metric_table", metric_table)
                    line.tag("metric_field", metric_field)
                    line.float64_field("threshold", threshold)
                    line.float64_field("value", value)
                    line.string_field("message", f"{metric_field} 
exceeded threshold: {value}")
                    line.time_ns(timestamp)

                    # Write the alert to the database
                    influxdb3_local.write(line)
                else:
                    # We're in a cooldown period, log this but 
don't generate a new alert
                    cooldown_remaining = cooldown_seconds - 
(current_time - last_alert_time)
                    influxdb3_local.info(
                        f"Suppressing duplicate {alert_type} 
alert for {host}: {value} "
                        f"(cooldown: {int(cooldown_remaining)}s remaining)"
                    )

Key concepts explained

Let’s break down how this plugin uses the cache to prevent alert storms.

1. Configurable Parameters

The plugin accepts several arguments that make it adaptable to different monitoring scenarios:

threshold = float(args.get("threshold", "90"))
cooldown_seconds = int(args.get("cooldown_seconds", "300"))  # 5 
minutes default
metric_table = args.get("metric_table", "cpu_metrics")
metric_field = args.get("metric_field", "usage_percent")
alert_type = args.get("alert_type", "high_value")

This makes the plugin reusable across different metrics and alert types.

2. Unique Alert Identifiers

For each potential alert, we create a unique identifier based on the host and alert type:

alert_id = f"{host}:{alert_type}"

This allows us to track different alert types separately for each host.

3. Cache-Based Cooldown Period

The core of our alert de-duplication logic uses the in-memory cache:

last_alert_time = influxdb3_local.cache.get(alert_id)
current_time = timestamp / 1_000_000_000  # Convert ns to seconds

if last_alert_time is None or (current_time - last_alert_time > 
cooldown_seconds):
    # Generate alert and update cache
    influxdb3_local.cache.put(alert_id, current_time)
    # ...
else:
    # Suppress duplicate alert
    # ...

When an alert condition is detected, we check if we’re within the cooldown period for this specific alert. If not, we generate a new alert and update the cache with the current time.

4. Automatic Alert Generation

When a new alert is needed, we write to a dedicated “alerts” table:

line = LineBuilder("alerts")
line.tag("host", host)
line.tag("alert_type", alert_type)
# ...
influxdb3_local.write(line)

This creates a permanent record of alerts that can be queried for analysis or connected to notification systems. We could also enable this plugin to connect to third-party systems like PagerDuty, Slack, or Discord to send alerts.

Deploying the plugin

To deploy this plugin, save it as alert_deduplication.py in your InfluxDB plugin directory and create a trigger:

influxdb3 create trigger \
  --trigger-spec "table:system_metrics" \
  --plugin-filename "alert_deduplication.py" \
  --trigger-arguments 
  threshold=95,cooldown_seconds=600,metric_table=system_metrics,met
  ric_field=cpu_usage,alert_type=high_cpu \
  --database monitoring \
  cpu_alert_handler

You can create multiple triggers with different configurations to monitor various metrics:

influxdb3 create trigger \
  --trigger-spec "table:memory_metrics" \
  --plugin-filename "alert_deduplication.py" \
  --trigger-arguments 
  threshold=85,cooldown_seconds=300,metric_table=memory_metrics,met
  ric_field=memory_usage,alert_type=high_memory \
  --database monitoring \
  memory_alert_handler

Advanced configuration options

While our example focused on simple threshold-based alerts, you can extend this pattern to handle more sophisticated scenarios.

Dynamic Cooldown Periods

You could adjust the cooldown period based on the severity of the alert:

# Adjust cooldown period based on severity
severity = calculate_severity(value, threshold)
adjusted_cooldown = cooldown_seconds * (1 - severity/100)  # 
Shorter cooldown for more severe issues
influxdb3_local.cache.put(alert_id, current_time, 
ttl=adjusted_cooldown)

Alert Escalation

For persistent issues, you might implement escalation after repeated alerts:

# Get alert count from cache
alert_count = influxdb3_local.cache.get(f"{alert_id}:count", default=0)
alert_count += 1
influxdb3_local.cache.put(f"{alert_id}:count", alert_count)

# Escalate if this problem has triggered multiple alerts
if alert_count > 3:
    line.tag("priority", "high")
    line.string_field("message", f"ESCALATED: {message} (occurred {alert_count} times)")

Summary

The in-memory cache feature of InfluxDB 3’s Processing Engine enables powerful stateful processing directly within your database. By implementing alert de-duplication with configurable cooldown periods, you can create smarter monitoring systems that reduce noise while ensuring you’re notified of important events.

This simple example demonstrates one way to leverage the cache in your data processing pipelines. The same pattern can be applied to rate limiting, threshold adjustments, trend analysis, and many other scenarios where maintaining state between executions is valuable.

To learn more about InfluxDB 3’s Processing Engine and explore other capabilities, check out the documentation or try out some of the example plugins contributed by the community. Download InfluxDB 3 and get started with the Processing Engine today.

InfluxDB 3 Core and Enterprise Are Now in Beta

Paul Dix (InfluxData) — Mon, 17 Mar 2025 07:30:00 +0000

Today we’re excited to announce that InfluxDB 3 Core, our new open source product licensed under MIT/Apache 2, and InfluxDB 3 Enterprise are now in beta.

InfluxDB 3 Core is a high-speed, recent-data engine that collects and processes data in real-time, while persisting it to local disk or object storage. InfluxDB 3 Enterprise is a commercial product that builds on Core’s foundation, adding high availability, read replicas, enhanced security, and data compaction for faster queries. A free tier of InfluxDB 3 Enterprise will also be available for at-home, non-commercial use for hobbyists to get the full historical time series database set of capabilities.

Since launching both products in alpha on January 13, we have been building and iterating quickly—incorporating feedback, refining features, and pushing performance improvements. Now, with beta, we’re stabilizing APIs, ensuring seamless upgrades, and gearing up for general availability in April.

In this post, we’ll highlight what’s changed since alpha, what to expect from the beta, and our plans for getting to GA. If you’ve been waiting to try out InfluxDB 3 Core and Enterprise, now is the time—Download the beta build and join us on the InfluxDB community Discord.

Key improvements since alpha

Write-Through Caching for Faster Query Performance

InfluxDB 3 Core and Enterprise are designed to operate using object storage. However, to make queries fast, we have to manage the data lifecycle to ensure that hot data is always in RAM. Even a single request to object storage for a file in the query path can blow out our target response times. We put a bunch of work into caching, specifically write-through caching, to ensure that queries for recent data are always served from RAM if the server is configured with enough cache space.

Processing Engine

The other big effort during the alpha was the build-out of the processing engine. This is an embedded Python VM for data transformation, enrichment, downsampling, and alerting—all within the database itself. It supports triggers on writes, schedules, or HTTP requests to a trigger-bound endpoint. We now have all trigger types wired up and example plugins written in the InfluxDB 3 Plugins Repo. Here’s one we wrote about recently for monitoring and alerting.

In the plugin API, we’ve added functions to query the local database, write data back into any database, and, most recently, an in-memory cache that can be accessed across separate trigger invocations. The full details are outside the scope of this post, but you can find more information on the plugin documentation page.

Simplifying Multi-Server Enterprise Clusters

For our Enterprise offering, we’ve worked on making setting up and managing multi-server clusters much easier and seamless. An Enterprise setup with multiple nodes in a cluster communicates by sharing data through object storage; they’re isolated from each other for better fault isolation and robustness. Enterprise makes it easy to separate ingest from compaction, query processing, and trigger processing. Operators can scale each component independently, tailoring it to their own tooling and rules.

These are just a few of the improvements we’ve made. There’s more under the hood, but these represent some of the highlights.

What to expect during beta

Moving into beta means we will no longer make breaking changes to the API. Any updates to file formats or organization will have in-place upgrade paths for builds starting with today’s beta release. This means that you can use the beta for testing and validation purposes and be sure that when you upgrade, your data will come with it, and your APIs will all work the same.

That said, we’re not recommending the software for production just yet. Our focus during beta is on testing, robustness, performance, and tooling for production deployments. We’ll make weekly releases, each with an associated changelog, to make it easy to track updates. When we release the GA, you’ll be able to upgrade any beta deployment seamlessly.

Processing engine still in alpha

The embedded Python VM, which is the processing engine, should still be considered alpha software. It is fully functional and allows users to create and share plugins that trigger off of writes, a schedule, or requests to an HTTP endpoint. However, we want to continue iterating on user feedback and making changes where they make sense.

While we don’t anticipate any breaking changes, we’re not yet committed to the current API as the long-term support target. The feedback we get during this phase will be important to fine-tune the API for our users’ needs.

Files in object store

InfluxDB 3 Core and Enterprise support a “diskless” architecture, keeping all states in object storage. This means that as an operator, you can inspect the files that the database puts there. However, we want to point out that the specific file layout and format should not be considered part of a stable API. Only the HTTP and Apache Arrow Flight APIs should be considered stable.

We may evolve the organization, layout, and file formats over time. From this point forward, any changes will work with in-place upgrades. However, access to the data in the database is stable only through the regular front door API.

Path to general availability

The InfluxDB 3 Core & Enterprise betas represent the feature set we intend to ship at general availability. From now until GA, we’re focusing on testing, performance, robustness, and tooling.

We’re also launching a lighthouse customer program for early adopters of InfluxDB 3 Enterprise. If you’re interested in being an early customer, please get in touch.

We expect to have the generally available release in the late April timeframe. We look forward to any feedback on either Core or Enterprise.

Announcing InfluxDB 3 Enterprise free for at-home use and an update on InfluxDB 3 Core’s 72-hour limitation

Paul Dix (InfluxData) — Mon, 27 Jan 2025 01:00:00 +0000

Two weeks into the alpha release of InfluxDB 3 Core (our new open source offering) and InfluxDB 3 Enterprise (our newest commercial offering), we’ve received a good amount of feedback that the 72 hour limitation in Core is too limiting. This fell into three categories:

At-home users using InfluxDB for home sensors and systems metrics often look at weeks, months, or even a year of data for their data
Open source users who expect to be able to write data from any time frame and query any data they write to the database
Open source users who expect InfluxDB 3 Core to be able to query large historical ranges of data, just like InfluxDB 1 & 2 open source can

For the users in category 1, we’re announcing a free tier of InfluxDB 3 Enterprise for at-home, non-commercial use. It will be rate limited in some way, but our intention is to give a free option with all the capabilities of Enterprise for these at-home users. If you’re an at-home user interested in this, please reach out on our community Discord and tell us more about what kinds of rate limits will work for your use case.

For the users in category 2, we’ve lifted the limitation on what time-stamped data can be written to InfluxDB 3 Core (you can write for any historical period) and we’ve lifted the limitation on what period of time can be queried. However, the limitation on the range of time a single query can cover is still limited (in hours), due to specific implementation details which I’ll cover more about in this post. InfluxDB 3 Core is optimized for querying short ranges of time (i.e. hours, not days).

For the users in category 3, we understand that InfluxDB 3 Core doesn’t cover everything that InfluxDB 1 and 2 do–it’s designed to fill a unique role in the time series toolkit, offering a highly performant, recent-data engine. For those on versions 1 and 2 and are happy with them, there’s no reason to move off. For users requiring the ability to query longer ranges of time, this is one of the capabilities we sell in the InfluxDB 3 product line.

To see why this limitation exists in Core, but not in Enterprise (our commercial offering), the rest of this post gets into the technical details.

InfluxDB 3 Core and Enterprise organize data as it is written in into 10-minute blocks of time based on the timestamp of the data. If you issue a write request with thousands of lines of Line Protocol that have timestamps for the same measurement, but ranging over a period of an hour, this will be split into six chunks for each block of time in that hour (:00, :10, :20, :30, etc.). This data is kept in memory for fast query access and it is also written to a WAL for durability (this WAL can exist entirely in object store for diskless operation).

The WAL is periodically snapshotted to keep its size down. By default, this happens every 10 minutes, at which point the in-memory buffers will be written to Parquet files (one per measurement (i.e. table), per 10-minute block of time). The Parquet data is also put into an in-memory cache so that queries against this recently persisted data do not need to go to object storage. Once it is in the Parquet cache, the queryable WAL buffer is cleared.

When a query comes into the server, the time range of the query is examined to determine which Parquet files must be included in the query planning and then execution process. Data from the WAL buffer is always included in the query plan. Querying across many files will start to degrade performance and use up more RAM due to DataFusion reading metadata and row groups from each Parquet file.

From our testing, the more files that are included in a query, the more RAM usage there is and the slower the query gets. The impacts are even more pronounced if the files are not in the cache and the server must go to object store to get the metadata and then data. Executing a query in Core with a large range of time can potentially result in thousands of GET requests to object store and in many cases will get the database OOM killed or DataFusion will stop execution because the memory budget it has been allocated has been exhausted.

Because of these performance properties, we’ve set a configuration option that limits a query plan to 432 Parquet files, which is a 72-hour range of time given the 10-minute time blocks. This is an option that can be set on the server while starting it. We view this as a service protection mechanism. Values much higher than that will likely not yield great user experiences. Even querying across that many files will be less than ideal if you’re looking for very fast query response times.

InfluxDB 3 Enterprise lifts this limitation by including a compactor that rewrites these 10-minute files into larger blocks of time. Not only does it rewrite those files, it sorts the data by series and writes out a separate index that lets the query engine know which files contain what data. This is what makes it possible for InfluxDB 3 Enterprise to query across larger time ranges of data. It also enables Enterprise to give faster query response times on any time range that spans longer than an hour.

We think that InfluxDB 3 Core offers a compelling set of features for real-time, recent time series data. With the embedded Python processing engine and API, it makes Core ideal to act as a data collector that has store and forward capability and the ability to query data in real-time as it is ingested. It can also do ETL, monitoring and alerting, and shipping of data to object storage and other third-party services. This is all in addition to acting as a diskless time series database for recent data.

InfluxDB 3 Enterprise represents a full historical time series database along with everything that Core enables. It’s one of the products we sell. Building a sustainable business is what enables us to continue building InfluxDB 3 Core as a permissively licensed open source project. It also enables us to continue contributing and driving forward the state of the art in query engines with our work in Apache DataFusion, Arrow, Parquet, and the Rust object store crate.

If InfluxDB 3 Core sounds like it meets the needs of a project you’re working on, we hope you’ll give it a try.

InfluxDB 3 Open Source Now in Public Alpha Under MIT/Apache 2 License

Paul Dix (InfluxData) — Mon, 13 Jan 2025 06:30:00 +0000

New InfluxDB 3 Core and InfluxDB 3 Enterprise products now available for alpha testing.

Today we’re excited to announce the alpha release of InfluxDB 3 Core (download), the new open source product in the InfluxDB 3 product line along with InfluxDB 3 Enterprise (download), a commercial version that builds on Core’s foundation. InfluxDB 3 Core is a recent-data engine for time series and event data. InfluxDB 3 Enterprise adds historical query capability, read replicas, high availability, scalability, and fine-grained security.

For open source, we knew it was important to build a product that could run as a single process that would be easy to set up and start using right away. We also realized that many of our customers wanted an operationally simple database as an option either instead of or in addition to our scalable distributed system. The result is InfluxDB 3 Core in the open source (dual-licensed under MIT or Apache 2), and InfluxDB 3 Enterprise, a commercial version of the core open source offering.

These products are built on more than four years of development and powered by the FDAP stack—Apache Flight, DataFusion, Arrow, and Parquet—and delivered on our rebuilt time series database architecture. They deliver all the key capabilities of InfluxDB 3, including unlimited cardinality, native object storage support, and a powerful SQL query engine, while maintaining our commitment to the open source community.

I’ll dive into both products, focusing on how they address critical gaps in the developer toolset for time series data. I’ll also address several related topics in detail, outlined below:

InfluxDB 3 open source will be called InfluxDB 3 Core, a recent-data engine persisting Parquet files and enabling queries against the last 72 hours of data. [Editor’s note: limit against historical data lifted. See January 27, 2025 update for more information.] Development of Core will carry on under the permissive MIT or Apache 2 license
In addition to releasing InfluxDB 3 Core, we are releasing InfluxDB 3 Enterprise, a commercial version of the open source Core product.
Key features of Core and Enterprise and the unique spot they fill in the time series toolset with diskless architecture, fast recent data processing, and embedded Python for plugins and triggers.
Compatibility with previous InfluxDB versions, migration tooling, and what to expect when upgrading from 1.x/2.x.
Our commitment to open source, permissive licensing, and InfluxData’s continued focus on maintaining a clear distinction between open source and commercial products.

If you’re interested in downloading and using the software right away, you can find the getting started guides for both Core and Enterprise. You can find the open source repo for InfluxDB 3 Core here. During the alpha period, InfluxDB 3 Enterprise can be accessed with a free, time-limited trial. For Enterprise, we only ask for your email address during the setup process so you can get started without talking to anyone.

InfluxDB 3 Core key features

InfluxDB 3 Core gives developers a new tool for time series data management—a high-performance recent-data engine optimized for querying the last 72 hours of data [Editor’s note: limit against historical data lifted, see update]. This focused approach enables Core to deliver exceptional performance for real-time monitoring, data collection, and streaming analytics use cases. By optimizing specifically for this pattern, we’ve achieved query response times under 10ms for last-value queries and under 50ms for hour-long ranges.

“By optimizing for the most common use cases, we've created a system that delivers exceptional performance while remaining truly open source under permissive licensing.”

InfluxDB 3 Core is designed to operate either with a local disk and no dependencies or “diskless,” using object storage (e.g., S3) for all data. Paired with an embedded Python VM for writing plugins and a last value and distinct value cache, InfluxDB 3 Core is a useful data collector, monitoring agent, and recent time series database that persists data into Parquet files for long-term storage and access by third-party systems.

Diskless Architecture

A key feature of Core (and Enterprise) is their ability to operate in “diskless” mode, using object storage as the only persistence layer. While they maintain the ability to operate with only a local disk, the option to run statelessly using only object storage enables more dynamic operating environments. In these environments, data can be accessed seamlessly by third-party systems that can read from the object store.

Writes into the database are validated and buffered into an in-memory WAL that is flushed once per second to object storage. Writers can either receive a response after flush, guaranteeing durability, or receive an immediate response after validation. After being flushed, this data is put into an in-memory Arrow buffer that is queryable.

The WAL is periodically snapshotted, persisting the in-memory Arrow buffers to Parquet files on object storage. This process deletes the WAL files with data persisted to Parquet and writes snapshot files containing a summary of what was persisted. This keeps the WAL size down and manageable.

Third-party query engines, data lakes, and data warehouses can directly query the Parquet files that Core lands in object storage, giving users more ways to access their historical time series data.

Each host that writes data into object storage persists all files with a path starting with a unique identifier assigned by the user at startup time. Because all data is kept in object storage, we get all the benefits that come along with that: Multi-AZ durability guarantees, backup utilities, and the entire ecosystem of third-party object storage tooling. If a writing host goes down for some reason, a new host can be spun up with the identifier of the old host and pick up where it left off.

Third-party query engines, data lakes, and data warehouses can directly query the Parquet files that Core lands in object storage, giving users more ways to access their historical time series data. We chose Parquet as the persistence format specifically because of its broad adoption in the data ecosystem. This has become even more important as the Iceberg Catalog format has gained in popularity. InfluxDB 3 makes a great agent for landing real-time data in object storage and Iceberg Catalogs.

Fast Recent Data

InfluxDB 3 has features designed for fast access to recent data. This includes the in-memory buffer, Parquet cache, Last Value Cache, and Distinct Value Cache. Our performance targets are to query last values and distinct values in under 10 milliseconds, the last hour in under 50 milliseconds, and queries up to 72 hours in the past in less than a few hundred milliseconds [Note: limit against historical data lifted]. Making this possible with object storage used for persistence means we have a variety of in-memory caches.

The in-memory buffer serves as the fast query path for data in the WAL that has not yet been converted to Parquet and persisted. It is kept in the Arrow format in builders and appended to as data arrives. As data is snapshotted from the WAL, buffered to Parquet, and persisted to object storage, we write it into an in-memory Parquet cache before clearing it from the buffer. This means that for recent data, we should never have to touch object storage to answer a query.

The Last Value Cache is a new feature that lets users configure the database to cache the last N values seen for individual series, specific column values, or on a hierarchy. This can be done on a per-table basis or across the database as a whole. For example, if you have sensor data and you have the columns site_name, machine_id, and sensor_id, you could configure the last value cache to keep values on that hierarchy (site -> machine -> sensor) and then quickly get back the last two values seen for a specific sensor, all sensors within a machine, or all sensors within an entire site. The cache acts as an in-memory round-robin database that gets populated as WAL flushes occur (every second by default).

The Distinct Value Cache is another new feature that lets users configure the database to cache the unique values seen for a column or hierarchy of columns, similar to the way the Last Value Cache works. It populates on WAL flushes (every second), just like the Last Value Cache. While this same information is accessible via the SQL engine against the raw data, the Distinct Value Cache is designed to return values in 10 to 30 milliseconds, making it a great fit for building snappy UI experiences.

Plugins and Triggers via Embedded Python

As part of this alpha release, we’re testing the experience for a new plugin system that lets users define Python scripts that can collect, process, transform, and monitor data on the fly directly in the database. It comes with an all-new API and development process. It’s still at a very early stage, so we’ll be iterating on the functionality and exact developer experience—there may be breaking API changes during this time.

“We’re excited about the range of possibilities the plugin system will enable, particularly when paired with the fast recent data query engine and last value cache. We picked Python because of its broad adoption and the ability of most LLMs to write short Python scripts.”

The plugin system is the logical successor to functionality in earlier versions of InfluxDB, like Continuous Queries, Tasks, Kapacitor, and Telegraf. While Kapacitor and Telegraf continue to work with InfluxDB 3, the plugin system brings this functionality directly into the database. This system enables:

Custom data collection and transformation
Real-time monitoring and alerting
Integration with third-party services
Scheduled task execution
Downsampling and aggregation
HTTP endpoint creation for custom APIs

Users can define plugins that are triggered by various data lifecycle events in the database. The plugin API includes the ability to query the database, write data back into the database, and connect to any third-party service enabled through Python’s ecosystem of libraries and tools. The trigger points for plugins are:

On WAL flush sends a batch of write data to a plugin once a second (can be configured).
On Schedule executes plugin on a schedule configured by the user, and is useful for data collection and deadman monitoring.
On Request binds a plugin to an HTTP endpoint at /api/v3/plugins/<name> where request headers and content are sent to the plugin, which can then parse, process, and send the data into the database or to third party services

We’re excited about the range of possibilities the plugin system will enable, particularly when paired with the fast recent data query engine and last value cache. We picked Python because of its broad adoption and the ability of most LLMs to write short Python scripts. We think that with the tools available today, even non-programmers will be able to create plugins in the database to solve their domain-specific problems.

InfluxDB 3 Enterprise

InfluxDB 3 Enterprise is the second product we’re announcing today, which builds on Core’s foundation with the following capabilities:

High availability configuration
Read replicas for query and plugin processing scalability
Enhanced security features
Historical data compaction and indexing to enable faster queries for anything over one hour
Row-level delete support (coming soon)
Integrated admin UI (coming soon)

Enterprise is designed for operational simplicity whether deployed on bare metal, VMs, containers, or Kubernetes. Its architecture enables the isolation of different workloads while sharing only files on object storage, making it ideal for custom deployment architectures.

We will have data migration tools from previous versions of InfluxDB to bring over historical data.

Compatibility with previous InfluxDB versions

While we weren’t able to bring forward all features from the previous versions of InfluxDB, we have worked hard to bring some of the old APIs into the new version. We’ve maintained compatibility with these existing InfluxDB features:

Support for InfluxDB 1.x and 2.x write APIs
InfluxDB Line Protocol support
InfluxQL query support (and the v1 query API)

However, InfluxDB 3 does have some limitations compared to v1 and v2 with respect to how data is ingested. For Core, there is a hard limit of five databases and 2,000 tables across the server. For Enterprise, the limits are 100 databases and 4,000 tables. We’ve done this to limit resource utilization and how many individual Parquet files need to be persisted to object storage on snapshot of the buffer. Depending on how these limits work for our users, we may work to increase these in the future.

While InfluxDB 3 still supports schema on write, it does not support the addition of new tag columns after a table is created. This is because the set of tags and the time represent the primary key in tables. However, new fields can be added at any time. When creating schemas in InfluxDB 3, only unique identifying information for a row should be in a tag, while everything else should be a field. Generally, it will be best to use fields for data, not tags.

We will be working on data migration tools for InfluxDB Enterprise. Because open source is designed only for data in the last 72 hours, our recommendation for migration to open source is to mirror writes from older versions onto a new running open source instance and then change over after 72 hours. [Note: limit against historical data lifted.]

Unfortunately, we are not able to bring a compatibility layer forward for Flux users at this time. We’re hoping that the combination of the Python plugin system, SQL, and InfluxQL, will give users all the functionality they previously had with Flux.

The FDAP stack: core components of InfluxDB 3

We began developing InfluxDB 3 more than four years ago, building a new Rust-based core around the FDAP stack (Apache Arrow Flight, Apache DataFusion, Apache Arrow, and Apache Parquet). Investing in Apache Software Foundation projects and building InfluxDB 3 around them is one part of our strategy with open source development. We believe open source exists to create widely used commodities—free to use, improve, build on, commercialize, and inspire derivative projects. Specifically, we believe that a state-of-the-art, high-performance SQL engine with parser, planner, optimizer, and vectorized execution should be freely available to any user or company for any purpose without restriction, even if it’s by InfluxDB competitors.

“As we release new versions of InfluxDB built on this technology, we now have a strong tailwind that will continue to drive new features and performance along the way.”

We made the deliberate choice to build around open standards with the goal of having broader compatibility with third-party projects and products. This led to SQL for the query language, Flight and Arrow for RPC, and Parquet for the file format. The choice of Parquet has become even more important with the rise of the Iceberg Catalog format. We recognized the importance of this and even contributed nanosecond timestamps to the Iceberg Spec and implementation, to support the precisions that InfluxDB requires.

Over the last 4.5 years, we’ve helped build DataFusion into the high-performance columnar query engine it is today. Along the way, we developed, open sourced, and donated to the ASF the object store abstraction that gives DataFusion the ability to execute against files in the object stores of any of the major cloud providers. The results of our efforts and many contributors around DataFusion, led by InfluxData Staff Engineer and PMC Chair Andrew Lamb, can be seen in the SIGMOD paper from last year, and DataFusion’s recent spot atop the rankings of single node query engines against Parquet files.

This pace of innovation is only accelerating because of DataFusion’s home in the ASF. It’s what makes it strategically safe for companies of all sizes to contribute to and improve DataFusion. The largest companies in the world and startups of all kinds are not only using DataFusion, but also improving the performance, features, and reliability of the engine itself. These advancements flow directly into InfluxDB 3, continuously improving its performance and giving us the best possible outcome we could have hoped for when we embarked on this journey. It’s an unbeatable strategy compared to closed, proprietary software—we accelerate maturity by years.

From the start, our goal was for the core engine to be adopted by as many users and companies as possible, even beyond InfluxDB itself. This broad adoption fosters a larger pool of contributors who push the boundaries of innovation while creating more robust software. It’s battle-tested in many different environments, for many use cases, and with all kinds of data. As we release new versions of InfluxDB built on this technology, we now have a strong tailwind that will continue to drive new features and performance along the way.

Our open source philosophy

With today’s announcement, we are continuing our commitment to open source and maintaining a clear separation between our open source projects and commercial offerings. Rather than restricting usage through licensing, we’ve chosen to differentiate through architectural decisions that benefit both our open source users and commercial customers. We believe this approach fosters a more vibrant community while ensuring we can continue investing in open source development for the long term.

The decision to focus Core on recent data reflects a careful balance of technical and business considerations. Core’s recent-data optimization isn’t just a commercial boundary – it’s an architectural choice that enables better performance, reliability, and simplicity for the most common time series workloads. By optimizing for the most common use cases, we’ve created a system that delivers exceptional performance while remaining truly open source under permissive licensing. This focused approach allows us to ensure reliable operation by avoiding the complexity of compaction in the open source offering. Furthermore, it encourages ecosystem integration by making it simple for users to combine Core with their choice of third-party tools for historical analysis.

Ultimately, we’ll iterate on feedback from our community and our customers. We want to ensure that some version of InfluxDB will still serve at-home and side-project use cases. Depending on the feedback we receive, we may open up a not-for-commercial-use tier for Enterprise that is free to use.

Development timeline

The alpha period will focus on extensive testing and performance validation, integrating community feedback, refining the API, and enhancing operational experience. During this period, we may make breaking changes to file formats or APIs. Our goal is to transition into a beta in early March, which would mark the end of any potential breaking changes. A general release is planned for April, subject to learnings from the alpha and beta periods.

Joining the community and giving feedback

This is just the start of an ongoing journey, with continuous development and iteration happening in the open. Check out our getting started guide for Core and Enterprise, and join the following channels to give feedback:

Discord: Join #influxdb3_core on the InfluxDB Discord for direct interaction with our development team
Community Site
Reddit: r/InfluxDB
Slack: #influxdb3_core channel

The alpha releases of InfluxDB 3 Core and Enterprise represent our vision for the future of time series data and our commitment to the open source community. We look forward to your feedback and participation in shaping the future of InfluxDB.

The Plan for InfluxDB 3 Open Source

Paul Dix (InfluxData) — Thu, 21 Sep 2023 04:00:00 +0000

The commercial version of InfluxDB 3 is a distributed, scalable time series database built for real-time analytic workloads. It supports infinite cardinality, SQL and InfluxQL as native query languages, and manages data efficiently in object storage as Apache Parquet files. It delivers significant gains in ingest efficiency, scalability, data compression, storage costs, and query performance on higher cardinality data. So far this year we’ve announced the availability of InfluxDB 3 in three separate flavors: InfluxDB Cloud Serverless (multi-tenant usage based billing for smaller workloads), InfluxDB Cloud Dedicated (managed single-tenant offering for medium to large workloads), and InfluxDB Clustered (self-managed for medium to large workloads). In this post, we’re announcing our plan to deliver an open source InfluxDB 3, which we’re calling calling InfluxDB Edge.

Talking about open source InfluxDB 3 pulls the thread on many other topics that people will likely have questions about as a result. So we’ll go into detail on multiple related topics, but here are the highlights:

InfluxDB 3 open source will be called InfluxDB Edge, with development happening in the existing InfluxDB repo, continuing under a permissive MIT or Apache2 license.
After InfluxDB Edge is released, we will create a free community edition named InfluxDB Community with additional features not in Edge (this development effort will not be in the InfluxDB repo).
InfluxDB Community will be upgradeable to a commercial version of InfluxDB with features not available in either Edge or Community.
The InfluxDB IOx repo has been copied over to the InfluxDB repo under this commit. The IOx repo will be made private in a week.
Flux is in maintenance mode. We will continue to support and run it for our customers with security and critical fixes, but our current focus is on our core SQL and InfluxQL query engine.

I’ll cover each of these topics in the following sections along with a reflection on the development of InfluxDB 3 (previously named IOx), Flux and how we got here. There are headings for each section to make it easier to skip ahead if the later parts of this post are of more interest to you.

InfluxDB Edge: Open Source InfluxDB 3

InfluxDB Edge will be a standalone process optimized for providing a queryable, real-time buffer of time series and observational data of all kinds stored as Parquet files in either object storage or local disk. It will have an embedded VM for connecting to third-party systems to pull data into its buffer or for transforming and acting on data as it arrives, periodically on a schedule, or when data is persisted in Parquet files.

We believe that Parquet—as a standard format for observational and analytic data of all kinds—will be transformational for data science, analytics, sensor data, data warehousing, and important data tasks of all kinds. What is lacking at the moment is an easy way to get data into this format while having it available for query before that data is written into larger immutable Parquet files. We think that InfluxDB Edge can serve as a time series database for the leading edge of data while making this data available to third-party systems to collaborate and build around Parquet and object storage.

From an API perspective it will support the InfluxDB 1.x and 2.x write APIs with Line Protocol, the InfluxQL query API (same as in both previous major InfluxDB versions), and all new APIs specifically built for 3, including the ability to query with industry standard SQL via FlightSQL or InfluxQL via Apache Arrow Flight. For those familiar with InfluxDB 1.x and 2.x, this should sound similar in some respects to the prior versions, but also very different at the same time.

The database architecture for InfluxDB 3 doesn’t include the inverted index (TSI) or the time series merge tree (TSM) storage engine that InfluxDB 1.x and 2.x were built around. Its storage system is designed to organize data in bulk chunks that can be quickly processed and kept in highly compressed Parquet files. This means that it is optimized for queries against the leading edge of data and time series and analytic queries in particular. InfluxDB 3 Edge will not include a compactor for re-organizing the data for deletes or query optimization over longer time periods, which means its sweet spot will be for collecting and querying recent data.

“The inclusion of an embedded VM will make InfluxDB Edge a powerful agent for collecting, processing, and monitoring data in addition to being a leading edge time series database.”

We don’t intend for InfluxDB 3 Edge to be a replacement or “light” version of our commercial clustered, distributed database offerings, or a full replacement for all use cases of InfluxDB 1.x or 2.x open source. There will be some intersection in functionality, but over time, it will fill a different spot in the tool belt and infrastructure of any company working with time series data at scale. We intend for InfluxDB 3 Edge to fill some of the same needs as previous versions while also expanding out into new territory. The inclusion of an embedded VM will make InfluxDB Edge a powerful agent for collecting, processing, and monitoring data in addition to being a leading edge time series database.

InfluxDB Community: the successor to InfluxDB 1.x and 2.x

After the initial release of Edge, we intend to release another version of InfluxDB 3 that will be useful for time series workloads on more historical and longer time frames of data: InfluxDB Community. It will be free to use and be upgradable to a commercial version named simply, InfluxDB. The free-to-use version will include functionality like a compactor, which will add capabilities for deletes and re-organizing files to optimize for queries on longer time ranges of data than InfluxDB Edge. For the InfluxDB 1.x and 2.x users that don’t quite fit within the capabilities of Edge, Community will be the tool of choice for them.

Features that we’re likely to include in the commercial single server version of InfluxDB 3 might include:

Integration with third-party authentication providers
Attribute- and role-based access control (ABAC & RBAC)
Replicas for high availability
Federated query across multiple Edge or Community nodes

Our intent is to enable as much of the 1.x and 2.x open source user base to migrate over to either Edge or the free Community version as possible, while maintaining our ability to ship a commercial version of the single server InfluxDB. If you’re interested in getting updates about this upcoming version of InfluxDB, sign up here.

Different projects for different use cases

With this announcement today, we’re laying out the long-term vision for our product line and where we expect to land different features. We’ve defined the following products:

InfluxDB Edge (MIT/Apache2 open source, next product to release)
InfluxDB Community (free to use, release after edge)
InfluxDB (paid license, release with or after community)
InfluxDB Clustered (self-managed, annual subscription, available now)
InfluxDB Cloud Serverless (multi-tenant, usage billing, available now)
InfluxDB Cloud Dedicated (single-tenant, resource billing, available now)

All these products will support the InfluxDB 1.x and 2.x write APIs, the InfluxQL query API, FlightSQL, and future 3 APIs related to writing data, querying, and background processing via the embedded VM. These APIs and the InfluxDB data model form the set of common interfaces across all these products. Additionally, Parquet as a format for sharing data in bulk enables movement of data from one product to another.

“InfluxDB Community will provide all the functionality of Edge, but also make queries over longer time ranges of data more efficient while adding delete capabilities.”

InfluxDB Edge will be for collecting and transforming time series and observational data while providing a leading edge real-time database. It will be useful at the Edge, but also within the data center. It can run on its own or as part of a larger infrastructure that has many Edge nodes sending data to larger InfluxDB Dedicated or Clustered installations.

InfluxDB Community will provide all the functionality of Edge, but also make queries over longer time ranges of data more efficient while adding delete capabilities. We expect that a number of users of InfluxDB 1.x and 2.x will require these features before they can make the upgrade to 3. This will provide them with a free pathway to do so when we release it after the initial release of Edge. This is useful as a historical time series database where high availability or scale are not a concern.

InfluxDB paid edition will provide all the functionality of Edge and Community while adding features for high availability and security for groups working with the database. InfluxDB Community will be able to have the paid features turned on through licensing. The commercial version of InfluxDB single server will be ideal for environments that do not require scaleout and prefer to run on bare VMs without the overhead and complexity of Kubernetes. For small-to medium-sized production workloads that require security or high availability, this will be an ideal choice.

Finally, InfluxDB Cloud Dedicated and InfluxDB Clustered represent our flagship distributed, dynamically scalable, secure, and most robust database offerings. Based on the same InfluxDB distributed core, these products run inside Kubernetes with workload isolation separating ingest, query, and compaction from each other. All service tiers can scale independently from each other, and we plan to add distributed caching and query workload isolation in future versions. For environments that span multiple teams using the same backend, or medium to larger workloads, InfluxDB Cloud Dedicated or InfluxDB Clustered will be the ideal choice.

The history of InfluxDB 3 (formerly IOx)

Initially, we started the development of InfluxDB 3 in early 2020 as a research project to answer a few questions:

What would a new database architecture look like that supported infinite cardinality with data kept in object storage?
Could we build around an existing SQL engine to add support for the language and get performance wins?
What standards could we build around to enable more third-party integrations and compatibility with a broader ecosystem of tools?

As we looked into the changes we’d need to make to accomplish all these goals, we realized that we were looking at a near total rewrite of the core database. InfluxDB, up to this point, was written in Go with a database architecture that combined two kinds of databases into one: an inverted index and a time series store. We realized that this format wouldn’t work to serve the more analytical workloads we had in mind for future versions of InfluxDB.

When we announced we were working on a big update to InfluxDB in November of 2020 we called the project InfluxDB IOx, a new core for InfluxDB written in Rust, built with Apache Arrow, Apache DataFusion, Apache Parquet, and Arrow Flight. At that stage it was still a very early project with a long development path ahead. Over time, our choice of foundational tools evolved into a sophisticated stack for building analytic systems. We think that these building blocks are the future of open data systems, real-time analytics, lakehouse, and data warehouse architectures.

At the time we said that we’d build it as two pieces of software: an open source, shared-nothing data plane and a commercial closed source control plane, which we’d offer as a cloud-hosted product or self-managed software. Over the next three years of software development, we changed the architecture dramatically. As we made these changes, we did so in the open in the InfluxDB IOx repo.

While we’ve done this development, we’ve been unclear about what would ultimately be in the InfluxDB 3 open source builds. Today, with this announcement, we’re stating what we intend to include in the open source. As a first step, we’ve copied all the code from the IOx repo into the main branch (the new default) of the InfluxDB open source repo, which continues under a permissive MIT & Apache2 license. A week from today we’ll be closing out the IOx repo. For anyone that was pulling code from that repo, as of today they should point at this commit in the InfluxDB repo.

What is in the IOx repo is not what we intend to put in the final InfluxDB 3 builds, but we wanted to move that code over to a single point where anyone who was depending on it can reference it. Many of the libraries in the IOx code base will form the basis of InfluxDB 3 Edge. As of today, the main branch in the InfluxDB repo is the home for our open source efforts.

“Given the strength of the format and its increasing use in data and analytic systems, we think the time is right for InfluxDB 3 Edge to help users gather and query their data in real-time as it gets stored into Parquet files.”

Ultimately, our vision of an open data plane and a commercial control plane wasn’t viable due to necessary architecture changes, so we had to rethink what InfluxDB 3 would be. In the time we’ve been developing this new version of InfluxDB, we’ve seen Parquet get broader adoption. What seems to be missing at the moment is more useful tooling for gathering and transforming data into Parquet files. Given the strength of the format and its increasing use in data and analytic systems, we think the time is right for InfluxDB 3 Edge to help users gather and query their data in real-time as it gets stored into Parquet files.

Flux in maintenance mode

Flux is the custom scripting and query language we developed as part of our effort on InfluxDB 2.0. While we will continue to support Flux for our customers, it is noticeably absent from the description of InfluxDB 3. Written in Go, we built Flux hoping it would get broad adoption and empower users to do things with the database that were previously impossible. While we delivered a powerful new way to work with time series data, many users found Flux to be an adoption blocker for the database.

We spent years of developer effort on Flux starting in 2018. The size of the effort – including creating a new language, VM, query planner, parser, optimizer, and execution engine – was significant. We ultimately weren’t able to devote the kind of attention we would have liked to more language features, tooling, and overall usability and developer experience. We worked constantly on performance, but because we were building everything from scratch, all the effort was solely on the shoulders of our small team. We think this ultimately kept us from working on the kinds of usability improvements that would have helped Flux gain broader adoption.

For InfluxDB 3 we had a thesis that building on top of an existing engine would enable us to go faster and deliver more features with better performance over time. We decided on Apache Arrow DataFusion, an existing query parser, planner, and executor. It was a project still in its early stages in mid-2020 when we made this choice, but over the course of the last three years, there have been significant contributions from an active and growing community. While we remain major contributors to the project, it is continuously getting feature enhancements and performance improvements from a worldwide pool of developers. Our efforts on the Flux implementation would simply not be able to keep pace with the much larger group of DataFusion developers.

With InfluxDB 3 being a ground-up rewrite of the database in a new language (from Go to Rust), we weren’t able to bring the Flux implementation along. For InfluxQL we were able to support it natively by writing a language parser in Rust and then converting InfluxQL queries into logical plans that our new native query engine, Apache Arrow DataFusion, can understand and process. We also had to add new capabilities to the query engine to support some of the time series queries that InfluxQL enables. This is an effort that took over a year and is still ongoing. This approach means that the contributions to DataFusion also become improvements to InfluxQL given they share the underlying engine.

Initially, our plan to support Flux in 3 was to do so through a lower-level API that the database would provide. In our Cloud2 product, Flux processes connect to the InfluxDB 1 & 2 TSM storage engine through a gRPC API. We built support for this in InfluxDB 3 and started testing with mirrored production workloads. We quickly found that this interface performed poorly and had unforeseen bugs, eliminating it as a viable option for Flux users to bring their scripts over to 3. This is due to the API being designed around the TSM storage engine’s very specific format, which the 3 engine is unable to serve up as quickly.

We’ll continue to support Flux for our users and customers. Given the broad scope of Flux as a scripting language in addition to being a query language, planner, optimizer, and execution engine, a Rust-native version of it is likely out of reach. And because the surface area of the language is so large, such an effort would be unlikely to yield a version that is compatible enough to run existing Flux queries without modification or rewrites, which would eliminate the point of the effort to begin with.

For Flux to have a path forward, we believe the best plan is to update the core engine so that it can use FlightSQL to talk to InfluxDB 3. This would make an architecture where independent processes that serve the InfluxDB 2.x query API (i.e., Flux) would be able to convert whatever portion of a Flux script that is a query into a SQL query. That query would then get sent to the InfluxDB 3 process with the result being post processed by the Flux engine.

This is likely not a small effort as the Flux engine is built around InfluxDB 2.0’s TSM storage engine and the representation of all data as individual time series. InfluxDB 3 doesn’t keep a concept of series so the SQL query would either have to do a bunch of work to return individual series, or the Flux engine would do work with the resulting query response to construct the series. For the moment, we’re focused on improvements to the core SQL (and by extension InfluxQL) query engine and experience both in InfluxDB 3 and DataFusion.

We may come back to this effort in the future, but we don’t want to stop the community from self-organizing an effort to bring Flux forward. The Flux runtime and language exists as permissively licensed open source here. We’ve also created a community fork of Flux where the community can self-organize and move development forward without requiring our code review process. There are already a few community members working on this potential path forward. If you’re interested in helping with this effort, please speak up on this tracked issue.

We realize that Flux still has an enthusiastic, if small, user base and we’d like to figure out the best path forward for these users. For now, with our limited resources, we think focusing our efforts on improvements to Apache Arrow DataFusion and InfluxDB 3’s usage of it is the best way to serve our users that are willing to convert to either InfluxQL or SQL. In the meantime, we’ll continue to maintain Flux with security and critical fixes for our users and customers.

Continued commitment to open source

With InfluxDB 3 built around Apache Arrow, Apache DataFusion, Apache Parquet, and FlightSQL, we’ve expanded our commitment to open source. We actively contribute to, and in some cases lead, those upstream projects in addition to our efforts on InfluxDB 3. When we made the bet on these projects as the core of InfluxDB 3 in the summer of 2020, it wasn’t yet obvious that they would be adopted and contributed to as broadly as they have been.

We think that the Apache Arrow ecosystem of tools, Parquet, DataFusion, and Rust will form the basis of OLAP and large-scale data processing systems of the future. In addition to InfluxDB 3, we’re putting our open source efforts into these standards so that the community continues to grow and the Apache Arrow set of projects gets easier to use with more features and functionality.

We’re very excited about the future of InfluxDB Edge and hope you’ll follow along with the effort on the open source InfluxDB repo.

InfluxDB 3: System Architecture

Nga Tran, Paul Dix, Andrew Lamb, Marko Mikulicic (InfluxData) — Tue, 27 Jun 2023 07:35:00 +0000

InfluxDB 3 (previously known as InfluxDB IOx) is a (cloud) scalable database that offers high performance for both data loading and querying, and focuses on time series use cases. This article describes the system architecture of the database.

Figure 1 shows the architecture of InfluxDB 3 that includes four major components and two main storages.

The four components each operate almost independently and are responsible for:

data ingestion illustrated in blue,
data querying demonstrated in green,
data compaction shown in red, and
garbage collection drawn in pink respectively.

For the two storage types, one is dedicated to the cluster metadata named Catalog and the other is a lot larger and stores the actual data and named Object Storage, such as Amazon AWS S3. In addition to these main storage locations, there are much smaller data stores called Write Ahead Log (WAL) used by the ingestion component only for crash recovery during data loading.

The arrows in the diagram show the data flow direction; how to communicate for pulling or pushing the data is beyond the scope of this article. For data already persisted, we designed the system to have the Catalog and Object Storage as the only state and enable each component to only read these storages without the need to communicate with other components. For the not-yet-persisted data, the data ingestion component manages the state to send to the data querying component when a query arrives. Let us delve into this architecture by going through each component one-by-one.

Figure 1: InfluxDB 3.0 Architecture

Data ingestion

Figure 2 demonstrates the design of the data ingestion in InfluxDB 3. Users write data to the Ingest Router which shards the data to one of the Ingesters. The number of the ingesters in the cluster can be scaled up and down depending on the data workload. We use these scaling principles to shard the data. Each ingester has an attached storage, such as Amazon EBS, used as a write ahead log (WAL) for crash recovery.

Each ingester performs these major steps:

Identify tables of the data: Unlike many other databases, users do not need to define their tables and their column schema before loading data into InfluxDB. They will be discovered and implicitly added by the ingester.
Validate data schema: The data types provided in a user’s write are strictly validated synchronously with the write request. This prevents type conflicts propagating to the rest of the system and provides the user with instantaneous feedback.
Partition the data: In a large-scale database such as InfluxDB, there are a lot of benefits to partitioning the data. The ingester is responsible for the partitioning job and currently it partitions the data by day on the ‘time’ column. If the ingesting data has no time column, the Ingest Router implicitly adds it and sets its value as the data loading time.
Deduplicate the data: In time series use cases, it is common to see the same data ingested multiple times, so InfluxDB 3 performs the deduplication process. The ingester builds an efficient multi-column sort merge plan for the deduplication job. Because InfluxDB uses DataFusion for its Query Execution and Arrow as its internal data representation, building a sort merge plan involves simply putting DataFusion’s sort and merge operators together. Running that sort merge plan effectively on multiple columns is part of the work the InfluxDB team contributed to DataFusion.
Persist the data: The processed and sorted data then persists as a Parquet file. Because data is encoded/compressed very effectively if it is sorted on the least cardinality columns, the ingester finds and picks the least cardinality columns for the sort order of the sort mentioned above. As a result, the size of the file is often 10-100x smaller than its raw form.
Update the Catalog: The ingester then updates the Catalog about the existence of the newly created file. This is a signal to let the other two components, Querier and Compactor, know that new data has arrived.

Even though the ingester performs many steps, InfluxDB 3 optimizes the write path, keeping write latency minimal, on the order of milliseconds. This may lead to a lot of small files in the system. However, we do not keep them around for long. The compactors, described in a later section, compact these files in the background.

The ingesters also support fault tolerance, which is beyond the scope of this article. The detailed design and implementation of ingesters deserve their own blog posts.

Figure 2: Data Ingestion

Data querying

Figure 3 shows how InfluxDB 3 queries data. Users send a SQL or an InfluxQL query to the Query Router that forwards them to a Querier, which reads needed data, builds a plan for the query, runs the plan, and returns the result back to the users. The number of queriers can be scaled up and down depending on the query workload using the same scaling principles used in the design of the ingesters.

Each querier performs these major tasks:

Cache metadata: To support high query workload effectively, the querier keeps synchronizing its metadata cache with the central catalog to have up-to-date tables and their ingested metadata.
Read and cache data: When a query arrives, if its data is not available in the querier’s data cache, the querier reads the data into the cache first because we know from statistics that the same files will be read multiple times. Querier only caches the content of the file needed to answer the query; the other part of the file that the query does not need based on the querier’s pruning strategy is never cached.
Get not-yet-persisted data from ingesters: Because there may be data in the ingesters not yet persisted into the Object Storage, the querier must communicate with the corresponding ingesters to get that data. From this communication, the querier also learns from the ingester whether there are newer tables and data to invalidate and update its caches to have an up-to-date view of the whole system.
Build and execute an optimal query plan: Like many other databases, the InfluxDB 3 Querier contains a Query Optimizer. The querier builds the best-suited query plan (aka optimal plan) that executes on the data from the cache and ingesters, and finishes in the least amount of time. Similar to the design of the ingester, the querier uses DataFusion and Arrow to build and execute custom query plans for SQL (and soon InfluxQL). The querier takes advantage of the data partitioning done in the ingester to parallelize its query plan and prune unnecessary data before executing the plan. The querier also applies common techniques of predicate and projection pushdown to further prune data as soon as possible.

Even though data in each file does not contain duplicates itself, data in different files and data that is not yet persisted sent to the querier from the ingesters may include duplicates. Thus the deduplication process is also necessary at query time. Similar to the ingester, the querier uses the same multi-column sort merge operators described above for the deduplication job. Unlike the plan built for the ingester, these operators are just a part of a bigger and more complex query plan built to execute the query. This ensures the data streams through the rest of the plan after deduplication.

It is worth noting that even with an advanced multi-column sort merge operator, its execution cost is not trivial. The querier optimizes further the plan to only deduplicate overlapped files in which duplicates may happen. Furthermore, to provide high query performance in the querier, InfluxDB 3 avoids as much deduplication as possible during query time by compacting data beforehand. The next section describes the compaction process.

The detailed design and implementation of the querier tasks described briefly above deserve their own blog posts.

Figure 3: Data Querying

Data compaction

As described in the “Data ingestion” section, to reduce the ingest latency, the amount of data processed and persisted into each file by an ingester is very minimal. This leaves many small files stored in the Object Storage which in turn create significant I/O during query time and reduce the query performance. Furthermore, as discussed in the “Data querying” section, overlapped files may contain duplicates that need deduplication during query time, which reduces query performance. The job of data compaction is to compact many small files ingested by the ingesters to fewer, larger, and non-overlapped files to gain query performance.

Figure 4 illustrates the architecture of the data compaction, which includes one or many Compactors. Each compactor runs a background job that reads newly ingested files and compacts them together into fewer, larger, and non-overlapped files. The number of compactors can be scaled up and down depending on the compacting workload, which is a function of the number of tables with new data files, the number of new files per table, how large the files are, how many existing files the new files overlap with, and how wide a table is (aka how many columns are in a table).

In the article, Compactor: A hidden engine of database performance, we described the detailed tasks of a compactor: how it builds an optimized deduplication plan that merges data files, the sort order of different-column files that helps with the deduplication, using compaction levels to achieve non-overlapped files while minimizing recompactions, and building an optimized deduplication plan on a mix of non-overlapped and overlapped files in the querier.

Like the design of the ingester and querier, the compactor uses DataFusion and Arrow to build and execute custom query plans. Actually, all three components share the same compaction sub-plan that covers both data deduplication and merge.

The small and/or overlapped files compacted into larger and non-overlapped files must be deleted to reclaim space. To avoid deleting a file that is being read by a querier, the compactor never hard deletes any files. Instead, it marks the files as soft deleted in the catalog, and another background service named Garbage Collector eventually deletes the soft deleted files to reclaim storage.

Figure 4: Data Compaction

Garbage collection

Figure 5 illustrates the design of InfluxDB 3.0 garbage collection that is responsible for data retention and space reclamation. Garbage Collector runs background jobs that schedule to soft and hard delete data.

Data retention:

InfluxDB provides an option for users to define their data retention policy and save it in the catalog. The scheduled background job of the garbage collector reads the catalog for tables that are outside the retention period and marks their files as soft deleted in the catalog. This signals the queriers and compactors that these files are no longer available for querying and compacting, respectively.

Space reclamation:

Another scheduled background job of the garbage collector reads the catalog for metadata of the files that were soft deleted a certain time ago. It then removes the corresponding data files from the Object Storage and also removes the metadata from the Catalog.

Note that the soft deleted files came from different sources: compacted files deleted by the compactors, files outside the retention period deleted by the garbage collector itself, and files deleted through a delete command that InfluxDB 3 plans to support in the future. The hard delete job does not need to know where the soft deletes come from and treats them all the same.

Soft and hard deletes are another large topic that involves the work in the ingesters, queriers, compactors, and garbage collectors and deserve their own blog post.

Figure 5: Garbage Collection

InfluxDB 3 cluster setup

Other than the queriers making requests to their corresponding ingesters for not-yet-persisted data, the four components do not talk with each other directly. All communication is done via the Catalog and Object Storage. The ingesters and queriers do not even know of the existence of the compactors and garbage collector. However, as emphasized above, InfluxDB 3 is designed to have all four components co-exist to deliver a high performance database.

In addition to those major components, InfluxDB also has other services such as Billing to bill customers based on their usage.

Catalog Storage

InfluxDB 3.0 Catalog includes metadata of the data such as database (aka namespace), tables, columns, and file information (e.g., the file location, size, row count, etc …). InfluxDB uses a Postgres compatible database to manage its catalog. For example, local cluster setup can use PostgreSQL while the AWS cloud setup can use Amazon RDS.

Object Storage

InfluxDB 3 data storage only contains Parquet files which can be stored on local disk for local setup and in Amazon S3 for AWS cloud setup. The database also works on Azure Blob Storage and Google Cloud Storage.

InfluxDB 3 cluster operation

InfluxDB 3 customers can set up multiple dedicated clusters, each operating independently to avoid “noisy neighbor” issues and contain potential reliability problems. Every cluster utilizes its own dedicated computational resources and can function on single or multiple Kubernetes clusters. This isolation also contains the potential blast radius of reliability issues that could emerge within a cluster due to activities in another.

Our innovative approach to infrastructure upgrades utilizes in-place updates of entire Kubernetes clusters. The fact that most of the state in the InfluxDB 3 cluster is stored outside the Kubernetes clusters, such as in S3 and RDS, facilitates this process.

Our platform engineering system allows us to orchestrate operations across hundreds of clusters and offers customers control over specific cluster parameters that govern performance and costs. Continuous monitoring of each cluster’s health is part of our operations, allowing a small team to manage numerous clusters effectively in a rapidly evolving software environment.

Compactor: A Hidden Engine of Database Performance

Paul Dix, Nga Tran (InfluxData) — Mon, 27 Mar 2023 07:00:00 +0000

This article was originally published in InfoWorld and is reposted here with permission.

The compactor handles critical post-ingestion and pre-query workloads in the background on a separate server, enabling low latency for data ingestion and high performance for queries.

The demand for high volumes of data has increased the need for databases that can handle both data ingestion and querying with the lowest possible latency (aka high performance). To meet this demand, database designs have shifted to prioritize minimal work during ingestion and querying, with other tasks being performed in the background as post-ingestion and pre-query.

This article will describe those tasks and how to run them in a completely different server to avoid sharing resources (CPU and memory) with servers that handle data loading and reading.

Tasks of post-ingestion and pre-query

The tasks that can proceed after the completion of data ingestion and before the start of data reading will differ depending on the design and features of a database. In this post, we describe the three most common of these tasks: data file merging, delete application, and data deduplication.

Data file merging

Query performance is an important goal of most databases, and good query performance requires data to be well organized, such as sorted and encoded (aka compressed) or indexed. Because query processing can handle encoded data without decoding it, and the less I/O a query needs to read the faster it runs, encoding a large amount of data into a few large files is clearly beneficial. In a traditional database, the process that organizes data into large files is performed during load time by merging ingesting data with existing data. Sorting and encoding or indexing are also needed during this data organization. Hence, for the rest of this article, we’ll discuss the sort, encode, and index operations hand in hand with the file merge operation.

Fast ingestion has become more and more critical to handling large and continuous flows of incoming data and near real-time queries. To support fast performance for both data ingesting and querying, newly ingested data is not merged with the existing data at load time but stored in a small file (or small chunk in memory in the case of a database that only supports in-memory data). The file merge is performed in the background as a post-ingestion and pre-query task.

A variation of LSM tree (log-structured merge-tree) technique is usually used to merge them. With this technique, the small file that stores the newly ingested data should be organized (e.g. sorted and encoded) the same as other existing data files, but because it is a small set of data, the process to sort and encode that file is trivial. The reason to have all files organized the same will be explained in the section on data compaction below.

Refer to this article on data partitioning for examples of data-merging benefits.

Delete application

Similarly, the process of data deletion and update needs the data to be reorganized and takes time, especially for large historical datasets. To avoid this cost, data is not actually deleted when a delete is issued but a tombstone is added into the system to ‘mark’ the data as ‘soft deleted’. The actual delete is called ‘hard delete’ and will be done in the background.

Updating data is often implemented as a delete followed by an insert, and hence, its process and background tasks will be the ones of the data ingestion and deletion.

Data deduplication

Time series databases such as InfluxDB accept ingesting the same data more than once but then apply deduplication to return non-duplicate results. Specific examples of deduplication applications can be found in this article on deduplication. Like the process of data file merging and deletion, the deduplication will need to reorganize data and thus is an ideal task for performing in the background.

Data compaction

The background tasks of post-ingestion and pre-query are commonly known as data compaction because the output of these tasks typically contains less data and is more compressed. Strictly speaking, the “compaction” is a background loop that finds the data suitable for compaction and then compacts it. However, because there are many related tasks as described above, and because these tasks usually touch the same data set, the compaction process performs all of these tasks in the same query plan. This query plan scans data, finds rows to delete and deduplicate, and then encodes and indexes them as needed.

Figure 1 shows a query plan that compacts two files. A query plan in the database is usually executed in a streaming/pipelining fashion from the bottom up, and each box in the figure represents an execution operator. First, data of each file is scanned concurrently. Then tombstones are applied to filter deleted data. Next, the data is sorted on the primary key (aka deduplication key), producing a set of columns before going through the deduplication step that applies a merge algorithm to eliminate duplicates on the primary key. The output is then encoded and indexed if needed and stored back in one compacted file. When the compacted data is stored, the metadata of File 1 and File 2 stored in the database catalog can be updated to point to the newly compacted data file and then File 1 and File 2 can be safely removed. The task to remove files after they are compacted is usually performed by the database’s garbage collector, which is beyond the scope of this article.

Figure 1: The process of compacting two files

Even though the compaction plan in Figure 1 combines all three tasks in one scan of the data and avoids reading the same set of data three times, the plan operators such as filter and sort are still not cheap. Let us see whether we can avoid or optimize these operators further.

Optimized compaction plan

Figure 2 shows the optimized version of the plan in Figure 1. There are two major changes:

The operator Filter Deleted Data is pushed into the Scan operator. This is an effective predicate-push-down way to filter data while scanning.
We no longer need the Sort operator because the input data files are already sorted on the primary key during data ingestion. The Deduplicate & Merge operator is implemented to keep its output data sorted on the same key as its inputs. Thus, the compacting data is also sorted on the primary key for future compaction if needed.

Figure 2: Optimized process of compacting two sorted files

Note that, if the two input files contain data of different columns, which is common in some databases such as InfluxDB, we will need to keep their sort order compatible to avoid doing a re-sort. For example, let’s say the primary key contains columns a, b, c, d, but File 1 includes only columns a, c, d (as well as other columns that are not a part of the primary key) and is sorted on a, c, d. If the data of File 2 is ingested after File 1 and includes columns a, b, c, d, then its sort order must be compatible with File 1’s sort order a, c, d. This means column b could be placed anywhere in the sort order, but c must be placed after a and d must be placed after c. For implementation consistency, the new column, b, could always be added as the last column in the sort order. Thus the sort order of File 2 would be a, c, d, b.

Another reason to keep the data sorted is that, in a column-stored format such as Parquet and ORC, encoding works well with sorted data. For the common RLE encoding, the lower the cardinality (i.e., the lower the number of distinct values), the better the encoding. Hence, putting the lower-cardinality columns first in the sort order of the primary key will not only help compress data more on disk but more importantly help the query plan to execute faster. This is because the data is kept encoded during execution, as described in this paper on materialization strategies.

Compaction levels

To avoid the expensive deduplication operation, we want to manage the data files in a way that we know whether they potentially share duplicate data with other files or not. This can be done by using the technique of data overlapping. To simplify the examples of the rest of this article, we will assume that the data sets are time series in which data overlapping means that their data overlap on time. However, the overlap technique could be defined on non-time series data, too.

One of the strategies to avoid recompacting well-compacted files is to define levels for the files. Level 0 represents newly ingested small files and Level 1 represents compacted, non-overlapping files. Figure 3 shows an example of files and their levels before and after the first and second rounds of compaction. Before any compaction, all of the files are Level 0 and they potentially overlap in time in arbitrary ways. After the first compaction, many small Level 0 files have been compacted into two large, non-overlapped Level 1 files. In the meantime (remember this is a background process), more small Level 0 files have been loaded in, and these kick-start a second round of compaction that compacts the newly ingested Level 0 files into the second Level 1 file. Given our strategy to keep Level 1 files always non-overlapped, we do not need to recompact Level 1 files if they do not overlap with any newly ingested Level 0 files.

Figure 3: Ingested and compacted files after two rounds of compaction

If we want to add different levels of file size, more compaction levels (2, 3, 4, etc.) could be added. Note that, while files of different levels may overlap, no files should overlap with other files in the same level.

We should try to avoid deduplication as much as possible, because the deduplication operator is expensive. Deduplication is especially expensive when the primary key includes many columns that need to be kept sorted. Building fast and memory efficient multi-column sorts is critically important. Some common techniques to do so are described here and here.

Data querying

The system that supports data compaction needs to know how to handle a mixture of compacted and not-yet-compacted data. Figure 4 illustrates three files that a query needs to read. File 1 and File 2 are Level 1 files. File 3 is a Level 0 file that overlaps with File 2.

Figure 4: Three files that a query needs to read

Figure 5 illustrates a query plan that scans those three files. Because File 2 and File 3 overlap, they need to go through the Deduplicate & Merge operator. File 1 does not overlap with any file and only needs to be unioned with the output of the deduplication. Then all unioned data will go through the usual operators that the query plan has to process. As we can see, the more compacted and non-overlapped files can be produced during compaction as pre-query processing, the less deduplication work the query has to perform.

Figure 5: Query plan that reads two overlapped files and one non-overlapped one

Isolated and hidden compactors

Since data compaction includes only post-ingestion and pre-query background tasks, we can perform them using a completely hidden and isolated server called a compactor. More specifically, data ingestion, queries, and compaction can be processed using three respective sets of servers: integers, queriers, and compactors that do not share resources at all. They only need to connect to the same catalog and storage (often cloud-based object storage), and follow the same protocol to read, write, and organize data.

Because a compactor does not share resources with other database servers, it can be implemented to handle compacting many tables (or even many partitions of a table) concurrently. In addition, if there are many tables and data files to compact, several compactors can be provisioned to independently compact these different tables or partitions in parallel.

Furthermore, if compaction requires significantly less resources than ingestion or querying, then the separation of servers will improve the efficiency of the system. That is, the system could draw on many ingestors and queriers to handle large ingesting workloads and queries in parallel respectively, while only needing one compactor to handle all of the background post-ingestion and pre-querying work. Similarly, if the compaction needs a lot more resources, a system of many compactors, one ingestor, and one querier could be provisioned to meet the demand.

A well-known challenge in databases is how to manage the resources of their servers — the ingestors, queriers, and compactors — to maximize their utilization of resources (CPU and memory) while never hitting out-of-memory incidents. It is a large topic and deserves its own blog post.

Compaction is a critical background task that enables low latency for data ingestion and high performance for queries. The use of shared, cloud-based object storage has allowed database systems to leverage multiple servers to handle data ingestion, querying, and compacting workloads independently. For more information about the implementation of such a system, check out InfluxDB IOx. Other related techniques needed to design the system can be found in our companion articles on sharding and partitioning.