InfluxData Blog - Company

From Edge to Enterprise: How Litmus and InfluxDB Are Modernizing the Industrial Data Stack

Ben Corbett (InfluxData) — Mon, 20 Apr 2026 00:00:00 +0000

Today at Hannover Messe, InfluxData is announcing a strategic partnership with Litmus to address one of the most persistent challenges in industrial data: getting reliable, contextualized telemetry from the shop floor into production systems.

Litmus bridges the gap between OT systems and modern IT infrastructure, while InfluxDB serves as the industrial data hub, giving organizations both real-time operational visibility and enterprise-scale historical analysis in a unified architecture.

By integrating Litmus Edge with InfluxDB 3 Enterprise, teams can collect and contextualize data at the source, then write it into a time series engine built for high-resolution data. Litmus handles connectivity and data normalization at the edge. InfluxDB provides high-throughput ingestion, real-time querying, and cost-efficient long-term storage, deployable at the edge, in the enterprise layer, or both.

The result is a system that captures every signal, retains its context, and makes it immediately usable

The industrial data problem

Something has shifted in industrial sectors. Modernization is no longer a roadmap item, but it’s starting to hit real constraints. The pull: industrial AI initiatives, predictive maintenance, cross-site analytics, digital twins, offer attractive value propositions. The push: legacy data historians are buckling under the demands of modern industrial operations, and the cost of extension is becoming harder to justify.

OT environments are notoriously fragmented. PLCs, CNCs, SCADA systems, and sensors operate across different protocols, vendors, and network boundaries. Getting that data into a usable, consistent format still requires heavy integration, time, and cost.

Traditional Historians made progress on the industrial data problem, but they weren’t built for what comes next. They struggle to preserve context across systems, degrade under high-frequency ingest and query load, and make cross-site analysis slow and expensive. This forces teams into trade-offs between fidelity, scale, and cost.

That’s the core issue: the value of industrial data is in its resolution and context. Most systems weren’t designed to retain either at scale.

How Litmus and InfluxDB work together

To move forward, teams need an architecture built for how industrial data actually behaves: high-frequency, distributed, and context-dependent. Litmus Edge and InfluxDB 3 Enterprise provide that foundation by collecting and structuring data at the edge, then making it available centrally without losing resolution or context.

Here’s how that looks in practice:

250+ prebuilt industrial connectors. Out-of-the-box connectivity to industrial data sources, including legacy systems and proprietary protocols. No custom integration required.
Collect and contextualize at scale. Normalize and contextualize telemetry from the source, with unlimited cardinality that preserves full context without compromising query performance.
Centralized data, not silos. Bring telemetry from tools, teams, and sites into a single architecture, from single-site monitoring to cross-plant analytics, without a data consolidation project.
Buffered, store-and-forward data transfer. Buffer and transmit data from remote sites with intermittent connectivity, with no loss or manual recovery.
Retain more, spend less. Keeps high-resolution data accessible long-term with object storage, without driving up storage costs as you scale.

The edge: collect, contextualize, buffer

Litmus Edge acts as the intelligence layer between your machines and the rest of your data architecture. With 250+ native connectors spanning OPC-UA, Modbus, MQTT, FANUC, Siemens S7, and more, it connects directly to industrial sources (PLCs, CNCs, DCS, SCADA systems, sensors, and beyond) without custom integration.

But connectivity alone isn’t enough. Raw signals without context aren’t useful. Litmus Edge tags, enriches, and structures data at the point of collection so a temperature reading is tied to an asset, production line, facility, and product run. By the time it leaves the edge, it’s already queryable.

The industrial data hub: Centralize, scale, retain

InfluxDB 3 serves as the system of record for industrial time series data, whether deployed at the edge, centralized in the enterprise layer, or both.

At the site level, InfluxDB runs locally alongside Litmus Edge, ingesting full-resolution telemetry and serving low-latency queries for real-time operations. It operates autonomously, so if connectivity to the central hub is interrupted, data is buffered locally and automatically forwarded when the connection is restored. There’s no data loss or manual intervention.

At the enterprise level, a centralized InfluxDB cluster aggregates data from every site into a single query layer across assets, plants, and time horizons. This creates a consistent, high-resolution data layer that can be used across operations, analytics, and industrial AI.

The bridge to higher-level analytics

With high-resolution, contextualized data available across systems, teams can move beyond basic monitoring. Predictive maintenance, anomaly detection, and cross-site analytics all depend on full-fidelity data. Industrial AI at the edge depends on low-latency access to it. Without that foundation, these systems don’t operate reliably. That’s what this architecture enables.

Get started

Whether you’re starting a greenfield initiative or hitting the limits of your current industrial data infrastructure, we’d love to talk.

Reach out to connect to an expert or join the conversation in the InfluxData Community Forums where our team and broader community are active.

If you’re attending Hannover Messe, come find me at the Litmus booth (Stand A09 in Hall 16) and see the architecture running end-to-end.

InfluxDB 3 on Amazon Timestream for InfluxDB: Real-Time Performance, Now Fully Managed on AWS

Evan Kaplan (InfluxData) — Thu, 16 Oct 2025 07:30:00 +0000

Today, we’re announcing a major milestone for developers building the next generation of intelligent, real-time systems: InfluxDB 3 is available on Amazon Timestream for InfluxDB, now the default time series database offered directly in the AWS Management Console. This brings InfluxDB 3, our next-generation time series database, directly into the AWS ecosystem for the first time. Starting today, both InfluxDB 3 Core (open source) and InfluxDB 3 Enterprise are available as fully-managed services on Amazon Timestream for InfluxDB, giving developers a direct path to deploy and scale real-time workloads on AWS.

We’re proud to bring InfluxDB to AWS in a way that not only meets developers where they already build, but also reinforces our commitment to the open source model, ensuring it remains vibrant and community-driven.

This is about meeting developers at the moment AI crosses from the digital world into the physical one, where precision and reliability become mission-critical.

Precision is Critical

While AI tools, such as LLMs, excel at generating code, copy, and images, their nondeterministic nature means their output is inherently unpredictable. That core trait makes them unsuitable for control-oriented problems, like steering industrial equipment, balancing energy grids, or guiding autonomous vehicles. These applications demand verifiable outcomes, which require systems that operate with deterministic precision. In these domains, a single misread signal can lead to downtime, safety risks, or costly disruptions, so every action must be provably correct.

High-resolution, high-cardinality time series is the foundation for real-world intelligence. It captures not just what changed, but how it changed, and the context of those changes across related signals. That combination of real-time data and context gives Industrial AI and machine learning models the predictive intelligence they need to act with confidence.

Built for the Future of Time Series

InfluxDB 3 is purpose-built for these demands. It ingests millions of unique measurements every second—fast enough to capture nanosecond-level changes in a power grid or every micro-movement of an industrial robot. Queries return in under 10 milliseconds, so predictive models can learn and react in real-time, often before an operator even knows there’s a problem.

Its built-in Python Processing Engine lets developers enrich and analyze data as it arrives, powering on-the-fly forecasting, anomaly detection, and alerting right inside the database. Unlimited cardinality ensures that even as sensors multiply and event dimensions explode, performance remains steady, giving AI systems the high-resolution signal they need for deterministic decisions.

The Power of AWS + InfluxDB

Bringing InfluxDB 3 to Amazon Timestream for InfluxDB puts these capabilities on AWS’s global infrastructure and makes InfluxDB the go-to time series option directly inside the AWS console. Developers can stream and query high-resolution data at scale and immediately connect with services like AWS Lambda, SageMaker, and Kinesis to train and deploy models in real-time.

Running InfluxDB 3 on AWS removes the usual operational overhead, allowing teams to concentrate on building applications and acting on insights. With no infrastructure to provision and no complex configuration, it brings together the openness and innovation of InfluxDB and the reliability and scale of AWS services.

Just as important, it ensures we continue to offer open source that’s free, community-driven, and available to all. Open source remains the core of who we are and the engine that drives our community forward. That commitment helps tame the inherent complexity of time series, making it more approachable for developers and more powerful in practice.

Enterprise-Grade from Day One

For production workloads, InfluxDB 3 Enterprise on Amazon Timestream for InfluxDB delivers:

Multi-region durability and automatic failover to safeguard data and ensure business continuity
Read replicas for high availability and elastic query throughput
Enhanced security and a diskless, cloud-native architecture for seamless scalability and consistently low-latency performance

Developers can start with InfluxDB 3 Core, our open source engine, and easily scale to Enterprise without re-architecting or migrating data.

Powering the Next Generation of Intelligent Systems

Bringing InfluxDB 3 to Amazon Timestream for InfluxDB opens a new chapter: time series at cloud scale, ready for the next wave of AI and real-time applications.

This is the infrastructure that will power tomorrow’s breakthroughs: systems that learn, react, and adapt with nanosecond precision.

To our community: your feedback and innovation continue to guide our journey. We can’t wait to see what you build.

Siemens Energy Standardizes Predictive Maintenance Operations on InfluxDB

Company (InfluxData) — Thu, 26 Sep 2024 08:00:00 +0000

Global energy leader scales and optimizes real-time data operations with InfluxData’s self-managed database

SAN FRANCISCO – September 26, 2024 – InfluxData, creator of the leading time series database InfluxDB, today announced that Siemens Energy, a global leader in sustainable energy solutions, is using InfluxDB to optimize data collection and analysis across its energy storage operations. Siemens Energy uses InfluxDB for predictive maintenance on its automated battery and marine production lines, allowing the company to gather high-frequency, high-resolution sensor data in real-time to power advanced monitoring and control systems.

“Siemens Energy had long used InfluxDB open source, but as we scaled, we needed a platform capable of handling the complexity, security, and real-time demands of our expanding operations,” said Jan Petersen, Senior Manufacturing Engineer at Siemens Energy. “Moving to commercial InfluxDB was a strategic move to unify our data infrastructure, ensuring we have the reliability, scalability, and real-time performance to keep pace with production needs. InfluxDB delivers real-time visibility across teams and different projects, enabling faster decision-making and proactive maintenance to drive operational efficiency.”

Siemens Energy’s automated factory, which produces battery modules that power marine vessels and electric ferries, relies on InfluxDB to manage high-cardinality sensor data generated across production lines and customer sites. InfluxDB captures essential metrics—such as performance data and test results—ensuring consistent battery quality and reliability throughout the manufacturing process. While InfluxDB open source supported its initial operations, it couldn’t meet the growing demands for scalability and real-time performance as Siemens Energy’s workloads grew.

Since migrating to commercial InfluxDB, Siemens Energy significantly scaled its data operations, managing 700 high-volume write requests and 800 real-time queries per minute across research and development labs and production cells. The platform processes critical data from nearly 23,000 battery modules deployed at more than 70 locations globally. Each battery module generates over 100 unique sensor measurements every minute, with data transferred in bulk due to intermittent internet connectivity on these vessels. With InfluxDB’s ability to ingest and analyze billions of time series data points at high speed, Siemens Energy can optimize production workflows and maintain operational excellence, even in challenging remote conditions.

“Siemens Energy is setting new standards in industrial automation, and InfluxDB plays a critical role in the foundation of these systems,” said Dean Sheehan, EMEA Field Chief Technology Officer at InfluxData. “By harnessing time series data for predictive maintenance, Siemens Energy can anticipate and resolve challenges before they arise, ensuring smooth, uninterrupted performance across global operations. With InfluxDB providing real-time monitoring and control, Siemens Energy can focus on innovation, ensuring seamless operations in its push toward sustainability.”

Last year, InfluxData rebuilt the core of its database to deliver InfluxDB 3, which brings significant gains in performance, including unlimited cardinality, high-speed ingest, and real-time querying to time series workloads. InfluxDB 3 gives developers an operational platform to manage high-resolution datasets without performance degradation, keeping systems responsive even when handling high-cardinality data. InfluxDB 3 is available to enterprises in InfluxDB Cloud Dedicated, a fully-managed, single-tenant time series database-as-a-service, as well as InfluxDB Clustered, a self-managed product for on-premises or private cloud deployments.

For more information on using InfluxDB 3 to power industrial operations, visit the InfluxData website.

About Siemens Energy

Siemens Energy is one of the world’s leading energy technology companies. The company works with its customers and partners on energy systems for the future, thus supporting the transition to a more sustainable world. With its portfolio of products, solutions, and services, Siemens Energy covers almost the entire energy value chain—from power and heat generation and transmission to storage. The portfolio includes conventional and renewable energy technology, such as gas and steam turbines, hybrid power plants operated with hydrogen, and power generators and transformers. Its wind power subsidiary Siemens Gamesa makes Siemens Energy a global market leader for renewable energies. An estimated one-sixth of the electricity generated worldwide is based on technologies from Siemens Energy. Siemens Energy employs around 99,000 people worldwide in more than 90 countries and generated revenue of €31 billion in fiscal year 2023. www.siemens-energy.com

About InfluxData

InfluxData is the creator of InfluxDB, the leading time series platform used to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data in real-time to discover, interpret, and share new insights to gain a competitive edge. InfluxData is a remote-first company with a globally distributed workforce. For more information, visit www.influxdata.com.

AWS Partners with InfluxData to Bring InfluxDB Open Source to Developers Around the World

Evan Kaplan (InfluxData) — Thu, 14 Mar 2024 12:00:00 +0000

Today, AWS announced Amazon Timestream for InfluxDB, a new managed offering for AWS customers to run single-instance open source InfluxDB natively within the AWS console. This partnership represents a significant multi-year commitment by AWS to combine its global reach and accessibility with our industry-leading time series database, InfluxDB. AWS adding InfluxDB as a preferred time series database reflects the demand from AWS customers for InfluxDB and evidence of the time series market acceleration.

At InfluxData, we are deeply committed to open source–both building an open product that can be improved by InfluxData and the developer community and making this product accessible to as many developers as possible. This means simple, logical entry points and distributing InfluxDB to developers around the world.

To date, Timestream referred strictly to the time series database-as-a-service offered by Amazon. With the introduction of InfluxDB to the AWS product offering for time series, Amazon is also announcing the renaming of that database-as-a-service offering to “LiveAnalytics.” Moving forward, Timestream refers to the category of time series products offered by Amazon, which now includes LiveAnalytics and InfluxDB as distinct time series engines.

By partnering with AWS, we’re making open source InfluxDB available to developers in every industry, market, and region. In 2024, the cloud is the de facto platform for modern software. That also makes the cloud the best way to build, run, and distribute open source, so this partnership puts time series capabilities right in the hands of every developer working with AWS.

As developers increasingly rely on time series databases to manage IoT and real-time analytics workloads, our partnership with AWS becomes even more critical. Time series data is the foundation for highly intelligent, instrumented systems. Time series databases are key to collecting and managing data produced by these systems in real-time. AWS recognizes the rapid growth of this category and the need to give developers purpose-built tools to manage time series data with the lowest barrier to entry. Amazon Timestream for InfluxDB delivers on this vision.

Amazon Timestream for InfluxDB now available

Amazon Timestream for InfluxDB is a new way for developers to use a single-node open source InfluxDB on managed AWS infrastructure with the scalability, reliability, and security of the AWS ecosystem but without the overhead that comes with self-managing InfluxDB. With just a few clicks, you can quickly set up, migrate, operate, and scale an InfluxDB database on AWS. The managed instance automatically configures for optimal performance and availability so you can start building and running time series applications immediately. No upfront costs, licenses, or commitments are required, and you only pay for what you use.

Available today. The initial Amazon Timestream for InfluxDB offering is based on InfluxDB OSS 2.x, making it ideal for open source users with smaller workloads that only require a single-node instance. Through the console, customers can opt into additional performance and scalability. Today, this includes high availability with AWS Multi-AZ for synchronous data replication and backup across availability zones in as little as 60 seconds with zero data loss. It also includes AWS-grade security with encryption of data in transit and at rest with AWS Key Management Service (KMS) and a configurable network using Amazon Virtual Private Cloud (VPC).

Coming soon. Because we’re actively developing InfluxDB 3 open source, it isn’t yet available in Amazon Timestream for InfluxDB. The InfluxDB 3 product line will arrive in Timestream for InfluxDB in the form of InfluxDB Community, currently in development for the new product suite. Once available, we’ll deliver two new modules specific to the 3 product offering: a Scale Module for distributed querying from multiple instances and a Security Module for fine-grained permissions, single sign-on, and audit logging.

AWS-hosted, InfluxData managed, powered by InfluxDB 3

Last year, we delivered a full product suite based on InfluxDB 3, our rebuilt database engine designed for time series analytics. InfluxDB 3 can ingest and analyze millions of data points per second, delivering real-time analytics for high-cardinality time series workloads. It also provides a 90 percent reduction in storage costs through best-in-category compression. With InfluxDB 3, users can perform advanced analytics on large fleets of devices in real-time at an unlimited scale. InfluxDB is an open platform based on the Open Data Architecture and built for seamless lakehouse integration.

Whether you need to manage small or enterprise-scale workloads, InfluxData and AWS have a product for you. InfluxDB Cloud Serverless runs on AWS and delivers a fully managed, elastic database running on multi-tenant cloud infrastructure. It’s ideal for teams with small to medium workloads. InfluxDB Cloud Dedicated also runs on AWS and provides users with dedicated, fully-managed cloud infrastructure optimized for large-scale workloads with added enterprise-grade security features. InfluxDB Cloud Dedicated and Serverless are part of the InfluxDB 3 product suite and include the latest capabilities for real-time analytics, support for SQL, and unlimited cardinality data, resulting in significant performance improvements.

Leading the time series category

Time series databases are now foundational to the modern data stack. As the world around us becomes more instrumented, applications, sensors, and systems emit a relentless stream of time series data. This data, when collected and analyzed, delivers valuable real-time insights.

The partnership to make InfluxDB a preferred AWS time series database signifies InfluxData’s category leadership and the high-growth market for time series databases. Together with AWS, we will build on this momentum and drive the market forward with products that support developers at every stage of their time series journeys.

InfluxData Collaborating with AWS to Bring InfluxDB and Time Series Analytics to Developers Around the World

Company (InfluxData) — Thu, 14 Mar 2024 12:00:00 +0000

InfluxDB open source now offered as a managed service for time series data powered by AWS, accessible within the AWS Management Console

SAN FRANCISCO – March 14, 2024 – InfluxData, creator of the leading time series platform InfluxDB, today announced a collaboration with Amazon Web Services (AWS) to deliver Amazon Timestream for InfluxDB, a new managed offering for AWS customers to run InfluxDB open source natively within the AWS Management Console. Now generally available, Amazon Timestream for InfluxDB allows users to quickly build and run time series applications on managed infrastructure with AWS’s scalability, reliability, and security without the overhead of self-management.

Time series databases are a fundamental component of the modern data stack. By analyzing millions of time series data points per second, time series databases help detect failures, improve reliability, predict behavior for information technology (IT) monitoring and Internet of Things and Industrial Internet of Things (IoT/IIoT) processes, and enable real-time analytics. Making InfluxDB a preferred AWS time series database signifies InfluxData’s category leadership and the high-growth market for time series databases to power real-time analytics and artificial intelligence (AI) training models. By working with InfluxData, AWS developers now have a simple way to manage and derive value from time series data within the AWS Management Console.

“InfluxData is deeply committed to open source—both building an open, permissively licensed product and making it accessible to as many developers as possible through simple and logical entry points and wide availability,” said Evan Kaplan, CEO of InfluxData. “The cloud is now the modern way to build, run, and distribute open source solutions. Working with AWS gives us the broadest reach to developers from every industry, market, and region so they can get started using time series and InfluxDB quickly and easily.”

“We’re delighted to offer a managed service for InfluxDB. With Amazon Timestream for InfluxDB, now customers with stringent latency requirements for their time-series applications can benefit from open-source APIs and the ease-of-use that our customers enjoy with managed database services on AWS, reduced operational burden, and enhanced reliability for their InfluxDB workloads,” said Jeff Carter, Vice President of Databases and Migration Services at AWS. “Our work with InfluxData brings one of the most popular open-source time-series databases as a managed service for AWS customers.”

With Amazon Timestream for InfluxDB, customers can start using a single-instance open source version of InfluxDB immediately by creating a managed instance automatically configured for optimal performance and availability. There are no upfront costs, licenses, or commitments required, and customers only pay for the resources they use.

The initial Amazon Timestream for InfluxDB offering is based on InfluxDB OSS 2.x, making it ideal for open source users with workloads that only require a single-node instance. Through the console, customers can opt into additional performance and scalability capabilities. This includes high availability with multiple AWS Availability Zones (multi-AZ) for synchronous data replication and backup across AZs in as little as 60 seconds with zero data loss. It also includes encryption-in-transit and at-rest with the AWS Key Management Service (AWS KMS) and a configurable network using Amazon Virtual Private Cloud (Amazon VPC).

Customers using InfluxDB OSS 2.x can get started with the AWS managed service by using the AWS Management Console, CLI, CDK, or AWS CloudFormation to create a new InfluxDB database instance. Once created, customers can use the InfluxDB APIs to restore backups from self-managed database instances. For more information about Amazon Timestream for InfluxDB, see our blog post or visit the AWS Management Console to get started.

In the future, Amazon Timestream for InfluxDB will expand to offer additional InfluxDB versions, including the new InfluxDB 3.0 product line via InfluxDB Community, currently in development. AWS and InfluxData will also collaborate to offer AWS customers additional modules specific to the 3.0 product offering. This includes a Scale Module for distributed querying from multiple instances and a Security Module for fine-grained permissions, single sign-on, and audit logging.

“Our IoT-connected solar home systems help power the world’s most remote areas through time series data analysis with InfluxDB,” said Christopher Baker-Brian, Co-founder & CTO at Bboxx. “With the majority of our stack built on AWS, Amazon Timestream for InfluxDB makes time series analytics even more accessible, allowing us to deliver energy to more people—beyond the millions already in network.”

“InfluxData puts the developer first in everything they do, ensuring they have the tools they need to manage and scale time series workloads in any environment,” said Jorge de la Cruz, Senior Product Manager at Veeam Software. “Amazon Timestream for InfluxDB gives open source developers a simple path to capture the value of time series data without the burden of self-managing. Combining InfluxData’s commitment to meeting developers where they are with the unmatched accessibility of AWS makes managing time series data easy to set up, operate, and scale.”

Developers who want to experience the benefits of InfluxDB 3.0 and AWS can get started with InfluxDB Cloud Serverless, which runs on AWS and delivers a fully managed, elastic database running on multi-tenant cloud infrastructure. InfluxDB Cloud Dedicated also runs on AWS and provides users with dedicated, fully-managed cloud infrastructure optimized for large-scale workloads with added enterprise-grade security features. Both InfluxDB Cloud Dedicated and Serverless are part of the InfluxDB 3.0 product suite and include the latest capabilities for real-time analytics, support for SQL, and unlimited cardinality data, resulting in significant performance improvements. InfluxDB 3.0 is available for purchase directly from InfluxData.

About InfluxData

InfluxData Achieves AWS Data and Competency Status

Michele Todd (InfluxData) — Thu, 11 Jan 2024 06:00:00 +0000

InfluxDB, the leading time series database, and AWS, the leading web services vendor, have a long-standing partnership. InfluxDB has been available as a SaaS product on AWS for many years.

And as InfluxDB has grown and matured, most notably with the release of InfluxDB 3.0 this year, so has our partnership with AWS. That’s why we’re excited to announce that InfluxData achieved AWS Data and Analytics Competency status in the Data Analytics Platforms and NoSQL/New SQL categories. This designation recognizes our expertise and success in helping customers leverage AWS services to build, manage, and analyze their data at scale.

As organizations increasingly rely on data to drive their business decisions, the need for robust and scalable data management and analytics solutions becomes paramount. One of the key factors at play is the vast amount of sensor data that companies need to cope with. Rapidly generated time series data requires purpose-built tools to ensure that systems and analysis can keep pace. With this competency designation, InfluxData demonstrates InfluxDB’s ability to deliver innovative and reliable solutions that meet the high standards set by AWS.

InfluxDB is a powerful time series database that enables organizations to store, analyze, and manage time-stamped data. With InfluxDB, businesses can capture and analyze real-time data from a range of sources, including IoT devices, sensors, and application logs. By leveraging AWS services such as Amazon EC2, Amazon S3, Lambda, IoT Core, and many others, InfluxDB users can easily scale their data infrastructure and unlock the full potential of their time series data across the AWS ecosystem.

Here are a few examples of customers that combine InfluxDB and AWS and do critical work in the process:

Teréga Solutions, a subsidiary of Teréga—one of France’s largest gas and transportation companies—uses InfluxDB hosted on AWS to power its modern data historian IO-Base. IO-Base, in turn, connects to Indabox, a proprietary gateway that connects to a range of sensors and systems. Using this technology, Teréga Solutions makes intelligent industrial operations available to organizations all over Europe.
Ju:niz Energy provides large-scale energy storage systems and develops intelligent energy management systems that control and optimize battery storage operations. The company uses InfluxDB, run on AWS, to collect sensor data on batteries and battery arrays to optimize energy consumption and enable predictive maintenance. This results in more efficient and accessible renewable energy.
LBBC Technologies is the world leader in the design and manufacturing of industrial autoclave technology. Aerospace customers use this equipment to manufacture high-performance castings, like turbine blades. LBBC uses InfluxDB Cloud to collect data from PLCs and other sensor data to enable remote anomaly detection and predictive maintenance. These insights help LBBC monitor equipment anywhere in the world, identify issues before they occur, and resolve those issues faster.

We are proud that AWS recognizes our commitment to delivering innovative and reliable data and analytics solutions. This achievement further solidifies our position as a leader in the industry and reinforces our dedication to helping organizations harness the power of their data.

Bringing it all together: Speed, performance, and efficiency in InfluxDB 3.0

Jason Myers (InfluxData) — Fri, 03 Nov 2023 07:00:00 +0000

For most of the past year, we here at InfluxData focused on shipping the latest version of InfluxDB. To date, we launched three commercial products (InfluxDB Cloud Serverless, InfluxDB Cloud Dedicated, and InfluxDB Clustered), with more open source options on the way. All the while, we claimed that this latest version of InfluxDB surpasses anything we built before. We’re not in the business of making empty promises, so this post draws together all the information currently available to support those claims in one place.

There are several inter-related factors and developments that contribute to the overall success of InfluxDB, and my goal is to draw clear connections between them. While some of this is necessarily simplified, I encourage readers to check out the in-depth, technical articles linked throughout if you’re interested in the details. So, without further ado, let’s dive in.

The Re-Write

At the outset of this journey, InfluxData founder Paul Dix decided to write the new database version in Rust. While Rust isn’t the easiest language to work with, it has many built-in advantages. It was also the case that some of the open source projects we used to build the new InfluxDB were written in Rust, but we’ll get back to those in a minute.

In addition to the shift from Go to Rust, we reconfigured the architecture of the database. One of the key decisions in this regard was to separate compute and storage. This allows InfluxDB to scale each of those components individually, giving users more flexibility with how they can scale their database.

One of the key challenges this new version of InfluxDB sought to solve was the cardinality problem. Because older versions wrote an index of the data to disk, the number of series that users could ingest and query without impacting performance correlated to the amount of memory committed to writing and managing the index (See this short explainer video). For high cardinality use cases, like traces, InfluxDB struggled.

We knew that optimizing InfluxDB to handle large time series workloads required us to solve the cardinality problem.

The road traveled

Spoiler alert: We solved the cardinality problem. It’s a bit of a chicken/egg question when thinking about what came first, the performance gains in the new InfluxDB or support for unlimited cardinality data. These, and several other items are interrelated.

Tools

One of the most critical decisions our team made was to build the new version on the Apache Arrow ecosystem. The FDAP stack (Apache Flight, DataFusion, Arrow, and Parquet) provided a lot of core functionality, and provided upstream open source tools that we could both use and contribute to. Not only did this expand our internal commitment to open source, but it allowed us to contribute time-specific code to these projects that facilitated the features of InfluxDB and simultaneously made InfluxDB more interoperable with other solutions build on the FDAP stack or its components.

A columnar database

Using Arrow as the data representation layer allowed us to build InfluxDB as a columnar database. This is a key development because it is a major shift from versions 1.x/2.x. One of the reasons that we opted for a columnar approach is because it lends itself to better data compression. The columnar approach also allowed us to rethink our data model. We simplified our data model so each measurement groups data together instead of separating it into individual time series. This means the database has less work to do at the point of ingest and can therefore, ingest data faster.

Compression

The columnar framework makes it easier to compress data on a per-column basis. This is a huge boost in its own right. We coupled this with Apache Parquet as our data persistence format. Parquet is designed to work with columnar data and provides high data compression ratios. So, once the data gets to a Parquet file, we’re able to compress it even further.

The results

There’s no judgment if you skipped the “how” sections to get to the results of all our hard work.

At the outset, we wanted to eliminate the bottlenecks that prevented InfluxDB from being able to handle the full range of time series workloads. This meant solving the cardinality problem, which meant the database needed to be able to ingest and query vast amounts of data in real-time without impacting performance.

Ingest

To accomplish that, we restructured the database, which included separating compute and storage to make them independently scalable. We streamlined the data ingest process so that it uses fewer resources and doesn’t rely on an on-disk index. In fact, v3.0 can ingest data with 45x more throughput than previous open source versions.

Storage and compression

Once all that data hits InfluxDB, it keeps fresh and frequently queried data in a hot storage tier. Older data goes to a cold storage tier as Apache Parquet files. Thanks to a columnar format and the compression-friendly Parquet format, InfluxDB 3.0 delivers high-ratio compression. That means that you can store more, high-fidelity data in less space.

For use cases that rely on historical analysis, this is a huge win because you don’t need to choose between analytical integrity and storage costs. For the cold storage tier, we use low-cost cloud storage, like Amazon S3. When we combine all those factors, the result can be a cost savings of 90%+.

Query

So, at this point we can ingest a ton of data and store it in a cost-effective manner. Now we just need to be able to query that data and analyze it. For the query side of things, we leaned into Apache DataFusion, which is a state-of-the-art query engine built in Rust that uses Arrow as its in-memory model. In short, it’s really fast and works well with the rest of the database. An additional advantage to DataFusion is that it allowed us to build in native support for SQL, which a lot of users asked for over the years. Not only does SQL support reduce barriers for entry for many people, the sheer speed of DataFusion helps InfluxDB 3.0 deliver real-time results.

Data visualization

The last piece of the puzzle is data analysis and visualization. In version 3.0, we wanted to get back to focusing on the core database. Instead of investing our resources in custom visualization tools, we think it makes more sense to leverage other best-in-breed tools. In other words, instead of making this version of InfluxDB try to do everything, we focused on integration and interoperability. It has a native integration with Grafana (also here), and can connect to Apache SuperSet and Tableau, with integrations for other tools in active development as well.

Looking ahead

Being built on open source also facilitates other integrations, too, like with artificial intelligence (AI) and machine learning (ML) solutions. ‘Real-world’ AI tools, which differ from generative AI tools, typically rely on time series data. These are the solutions that drive automation and predictive models for industrial operations, making them more efficient and effective. At the same time, these tools require large amounts of data to train their models, and the volume and velocity of time series data makes it a key source. InfluxDB functions as the intermediary between data sources, AI models, and end-user analysis, by managing that data and making it available to AI/ML tools in real-time.

Recap

To hammer home the point, check out these benchmarks, comparing InfluxDB 3.0 with previous open source versions. The speed and performance gains against our own – already leading – product are significant.

As time series data becomes increasingly critical across industries and sectors, the sheer volume of data produced requires a solution that can keep up at a real-world pace, in real-time, without sacrificing performance. InfluxDB 3.0 is that solution. And the best part? Version 3.0 is just the beginning; it’s only going to get better from here.

Take the time series leap with InfluxDB and get started building something awesome today.

How We Did It: Data Ingest and Compression Gains in InfluxDB 3.0

Rick Spencer (InfluxData) — Wed, 04 Oct 2023 07:35:00 +0000

A few weeks ago, we published some benchmarking that showed performance gains in InfluxDB 3.0 that are orders of magnitude better than previous versions of InfluxDB – and by extension, other databases as well. There are two key factors that influence these gains: 1. Data ingest, and 2. Data compression. This begs the question, just how did we achieve such drastic improvements in our core database?

This post sets out to explain how we accomplished these improvements for anyone interested.

A review of TSM

To understand where we ended up, it’s important to understand where we came from. The storage engine that powered InfluxDB 1.x and 2.x was something we called the InfluxDB Time-Structured Merge Tree (TSM). Let’s take a brief look at data ingest and compression in TSM.

TSM data model

There’s a lot of information out there about the TSM data model, but if you want a quick overview, check out the information in the TSM docs here. Using that as a primer, let’s turn our attention to data ingest.

TSM ingest

TSM uses an inverted index to map metadata to a specific series on disk. If a write operation adds a new measurement or tag key/value pair, then TSM updates the inverted index. TSM needs to write this data in the proper place on disk, essentially indexing it when written. This whole process requires a significant amount of CPU and RAM.

The TSM engine sharded files by time, which allowed the database to enforce retention policies, evicting expired data.

Introduction of TSI

Prior to 2017, InfluxDB calculated the inverted index on startup, maintained it during writes, and kept it in memory. This led to very long startup times, so in the autumn of 2017, we introduced the Time Series Index (TSI), which is essentially the inverted index persisted to disk. This created another challenge, however, because the size of TSI on disk could become very large, especially for high cardinality use cases.

TSM compression

TSM uses run length encoding for compression. This approach is very efficient for metrics use cases where data timestamps occurred at regular intervals. InfluxDB was able to store the starting time and time interval, and then calculate each time stamp at query time, based on only row count. Additionally, the TSM engine could use run length encoding on the actual field data. So, in cases where data did not change frequently and the timestamps were regular, InfluxDB could compress data very efficiently.

However, use cases with irregular timestamp intervals, or where the data changed with nearly every reading, reduced the effectiveness of compression. The TSI complicated matters further because it could get very large in high cardinality use cases. This meant that InfluxDB hit practical compression limits.

Introduction of InfluxDB 3.0

When we embarked upon architecting InfluxDB 3.0, we were determined to solve these limitations related to ingest efficiency and compression, and to remove cardinality limitations to make InfluxDB effective for a wider range time series use cases.

3.0 Data model

We started by rethinking the data model from the ground up. While we retain the notion of separating data into databases, rather than persisting time series, InfluxDB 3.0 persists data by table. In the 3.0 world, a “table” is analogous to a “measurement” in InfluxDB 1.x and 2.x.

InfluxDB 3.0 shards each table on disk by day and persists that data in the Parquet file format. Visualizing that data on disk looks like a set of Parquet files. The database’s default behavior generates a new Parquet file every fifteen minutes. Later, the compaction process can take those files and coalesce them into larger files where each file represents one day of data for a single measurement. The caveat here is that we limit the size of each Parquet file to 100 megabytes, so heavy users may have multiple files per day.

Parquet file partitioning and data model example for InfluxDB 3.0

InfluxDB 3.0 retains the notion of tags and fields; however, they play a different role in 3.0. A unique set of tag values and a time stamp identifies a row, which enables InfluxDB to update it with an UPSERT on write.

Now that we’ve explained a bit about the new data model, let’s turn to how it allows us to improve data ingest efficiency and compression.

Alternate partitioning options

We designed InfluxDB 3.0 to perform analytical queries (i.e. queries that summarize across a large number of rows) and optimized the default partitioning scheme for this. However, users may always need to query a subset of tag values for some measurements. For example, if every query includes a certain customer id or sensor id. Due to the way it indexes data and persists it to disk, TSM handles this type of query very well. This is not the case with InfluxDB 3.0 so those using the default partitioning scheme may experience a regression in performance for these specific queries.

The solution to this in InfluxDB 3.0 is custom partitioning. Custom partitioning allows the user to define a partitioning scheme based on tag keys and tag values, and to configure the time range for each partition. This approach enables users to achieve similar query performance for these specific query types in InfluxDB 3.0 while retaining its ingest and compression benefits.

InfluxDB 3.0 ingest path

The data model for InfluxDB 3.0 is simpler than previous versions because each measurement groups all of its data together instead of separating it out by series. This streamlines the data ingest process. When InfluxDB 3.0 handles a write, it validates the schema and then finds the structure in memory for any other data for that measurement. The database then appends the data to the measurement and returns a success or failure to the user. At the same time, it appends the data to a write ahead log (WAL). We call these in-memory structures “mutable batches” or “buffers” for reasons that will be clear below.

This process requires fewer compute resources compared to other databases, including InfluxDB 1.x and 2.x because, unlike other databases:

The ingest process does not sort or otherwise order data; it simply appends it.
The ingest process delays data deduplication until persistence.
The ingest process uses the buffer tree as an index. This index lives within each specific ingester instance and identifies the data required for specific queries. We were able to make this buffer tree extremely performant using hierarchical/sharded locking and reference counting to eliminate contention.

Source

When the mutable batches run out of memory, InfluxDB persists the data in the buffers as Parquet files in object store. By default, InfluxDB will persist all the data into Parquet every 15 minutes if the memory buffer hasn’t been filled. This is the point where InfluxDB 3.0 sorts and deduplicates data. Deferring this work off the ‘hot/write path’ keeps latency and variance low.

Streamlining the ingest process in this way means that the database does less work on the hot/write path. The result is a process that moves operations that are expensive to execute to persist time so that overall it requires a minimal amount of CPU and RAM, even for significant write workloads.

From completeness, there is also a Write Ahead Log that is flushed periodically ensuring all writes are immediately durable, and is used to reconstitute the mutable batches in cases where the container fails. InfluxDB 3.0 only replays WAL files in unclean shutdown or crash situations. In the ‘happy path’, (e.g., an upgrade) the system gracefully stops, flushes (i.e., persists) all buffered data to object storage and deletes all WAL entries. As a result, startup is fast, because InfluxDB doesn’t have to replay the WAL. Furthermore, in non-replicated deployments, the data that would otherwise sit on the WAL disk on an offline node is actually in object storage and readable, preserving read availability.

Leading edge queries

We also optimized InfluxDB 3.0 for querying the leading edge of data. Most queries, especially time sensitive ones, query the most recently written data. We call these “leading edge” queries.

When a query comes in, InfluxDB converts the data to Arrow, which then gets queried by the DataFusion query engine. In cases where all the data being queried already exists in the mutable batches, the ingester serves the querier the data in arrow format, and then DataFusion quickly performs any necessary sorting and deduplicating before returning the data to the user.

In cases where some or all of the data is not in the read buffer, InfluxDB 3.0 uses its catalog of Parquet files – stored in a fast relational database – to find the specific files and rows to load into memory in the Arrow format. Once loaded into memory, DataFusion is able to quickly query this data.

Compaction

A series of compaction processes help maintain the data catalog. These processes optimize the stored Parquet files by ordering and deduplicating the data. This makes finding and reading Parquet files on disk efficient.

Data compression

There are a few key reasons why we selected the Apache Arrow ecosystem, including Parquet, for InfluxDB 3.0. These formats were designed from the ground up to support high performance analytical queries on large data sets. Because they’re designed for columnar data structures, they can also achieve significant compression. The Apache Arrow community, which we’re proud to be a part of and contribute to, continues to improve these technologies.

Parquet file format

So, because InfluxDB 3.0 starts with Arrow’s columnar structure, it inherits significant compression benefits. Using Parquet compounds these benefits. Furthermore, because InfluxDB is a time series database, we can make some assumptions about time series data that allow us to get the most out of these compression techniques.

For example, suppose that retaining the notion of tag key/value pairs in InfluxDB 3.0 meant that dictionary encoding on these columns would yield the best compression. In this context, dictionary encoding assigns a number to each tag value. This number takes up a small number of bytes on disk. Then, Parquet can run length encode those numbers for each tag key column. On top of this encoding scheme, InfluxDB can apply general purpose compression (e.g., gzip, zstd) to compress the data even further.

The overall combination of Arrow and Parquet results in significant compression gains. When you combine those gains with the fact that InfluxDB 3.0 relies on object storage for historical data, users can store a lot more data, in less space, for a fraction of the cost.

Wrap up

Hopefully, you found this explanation of how InfluxDB 3.0 achieves such impressive ingest efficiency and compression interesting. In conjunction with the separation of compute and storage, customers can achieve significant total cost of ownership advantages over other databases, including InfluxDB 1.x and 2.x!

Try InfluxDB 3.0 to see how these performance and compression gains impact your applications.

InfluxData Announces InfluxDB Clustered to Deliver Time Series Analytics for On-Premises and Private Cloud Deployments

Company (InfluxData) — Wed, 06 Sep 2023 05:30:00 +0000

InfluxDB Clustered completes InfluxDB 3.0 commercial product line as successor to the hugely popular InfluxDB Enterprise product

SAN FRANCISCO – September 6, 2023 – InfluxData, creator of the leading time series platform InfluxDB, today announced InfluxDB Clustered, its self-managed time series database for on-premises or private cloud deployments. With the release of InfluxDB Clustered, InfluxData completes its commercial product line developed on InfluxDB 3.0, its rebuilt database engine optimized for real-time analytics with higher performance, unlimited cardinality, and SQL support.

InfluxDB Clustered is the evolution of InfluxDB Enterprise, InfluxData’s long-standing enterprise software product for on-premises and private cloud environments. Now with the release of InfluxDB Clustered, those same customers gain all the capabilities of the reimagined InfluxDB 3.0, but now specifically packaged and configured for their own unique hosting environments and data storage requirements. Deployed natively in Kubernetes, InfluxDB Clustered combines the scale and flexibility of the cloud with the security and control of a self-managed infrastructure.

“This release brings InfluxDB 3.0’s fundamental tenets of performance – unlimited cardinality, high-speed ingest, real-time querying, and superior data compression – to customers deploying their own custom infrastructure,” said Rick Spencer, VP of Products, InfluxData. “With InfluxDB Clustered we complete our 3.0 product portfolio and deliver on our promise to customers, bringing the flexibility of the cloud and the power of InfluxDB 3.0 together for the self-managed stack.”

“InfluxDB 3.0 introduced a columnar storage engine, which is intended to ease cardinality limitations and broaden the database’s ecosystem via SQL support and Apache Arrow integrations,” said Rachel Stephens, Senior Analyst, RedMonk. “With InfluxDB 3.0 now available in InfluxDB Clustered, enterprise customers of InfluxData will be able to access these new features and widen the use cases for time series databases in their self-managed environments.”

Like the rest of the InfluxDB 3.0 product suite, InfluxDB Clustered delivers significant improvements over its predecessor, InfluxDB Enterprise, in the following ways:

100x faster queries on high-cardinality data with powerful analytics performance that independently scales ingest and query.
45x faster data ingest enables real-time analytics on leading-edge data.
90 percent reduction in storage costs enabled by low-cost object store and separation of compute and storage combined with best-in-category data compression.
Enterprise-grade security and compliance with encryption of data in transit and at rest with private networking options, single sign-on (SSO), attribute-based access control (ABAC), and support for fully air-gapped deployments.

“We rely on InfluxDB to manage hundreds of billions of metrics across our research facilities. InfluxDB 3.0 will allow us to ingest and analyze this high-cardinality data in real-time at a fraction of the cost,” said Gianpietro Previtali, System Administrator, European XFEL. “InfluxDB 3.0 is a truly bold release from InfluxData, with new columnar architecture and the benefits of separating compute and storage for performant, real-time queries across leading-edge data.”

“At Vertical Aerospace, we’re pioneering electric aviation, which requires real-time analysis of highly distributed time series data. InfluxDB 3.0 helps us manage this data with nearly infinite storage capacity and much lower TCO,” said Tom Makin, Software Engineering Manager, Vertical Aerospace. “InfluxDB 3.0 leverages Apache Arrow to efficiently process high-cardinality data from ingestion to compaction to querying. It allows our team to uncover mission-critical insights across operations in real-time.”

With high availability and unprecedented scalability, InfluxDB Clustered gives enterprises the power of the industry’s leading time series database with the security, compliance, and control of a self-managed service. InfluxData also recently announced the availability of InfluxDB Cloud Dedicated, a fully managed and scalable single-tenant InfluxDB cluster based on the InfluxDB 3.0 architecture and intended for large-scale time series workloads. Together, InfluxDB Cloud Dedicated and InfluxDB Clustered give enterprises multiple options in how they manage and scale time series workloads, whether in the cloud, in their own environment, or in combination for hybrid environments.

Visit the InfluxData website to learn more about InfluxDB Clustered.

About InfluxData