Announcing InfluxDB Clustered: InfluxDB 3.0 for Self-Managed Environments
Rick Spencer /
Sep 06, 2023
Today, we’re excited to announce InfluxDB Clustered, our latest product developed on the InfluxDB 3.0 product suite. InfluxDB Clustered is the evolution of InfluxDB Enterprise, our popular self-managed product for large-scale time series workloads. For enterprises, the performance leap from InfluxDB Enterprise to InfluxDB Clustered is orders of magnitude higher with significant improvements across analytics, storage, and costs.
Like the rest of the InfluxDB 3.0 product suite, InfluxDB Clustered delivers the same high throughput for data writes and reads, support for unlimited data cardinality, real-time data analysis, and native SQL support for large time series workloads. InfluxDB 3.0 is developed in Rust and built on the Apache Arrow ecosystem (DataFusion, Parquet, Flight). Since Apache Arrow is forming the core of a new set of large-scale analytics tools, InfluxDB 3.0 is able to immensely benefit from the inherent interoperability with this next-generation toolset.
Self-managed for large-scale workloads
InfluxDB Clustered departs from the InfluxDB Cloud Serverless and InfluxDB Cloud Dedicated products released earlier this year in that it is a self-managed product. This gives you ultimate control over your time series database, making it well-suited to meet enterprise and compliance requirements. InfluxDB Clustered runs where you need it – on-premises, in your private cloud, or self-managed public cloud environments. This flexibility comes from the fact that we deliver InfluxDB Clustered as a collection of Kubernetes-based containers with decoupled, independently scalable ingest and query tiers.
This high availability and scalability gives you the ability to build and iterate technical infrastructure to meet your specific needs. Clustered allows you to scale your cluster up or down in size at will. Need to scale up for a few hours or days to accommodate an anticipated usage spike? Clustered lets you do that, too. No matter what your security or data residency requirements may be, InfluxDB Clustered can handle them.
High performance with unlimited scale
Developed on Apache Arrow for high-performance analytical queries, InfluxDB Clustered – like the rest of the InfluxDB 3.0 product suite – can handle high-speed, high-volume analytics in real-time. This includes managing high cardinality data without impacting performance.
There are several factors that play into this development, one of which is the separation of compute and storage. Clustered’s self-managed configuration means that you as the user can scale the components of the database to best suit the specific needs of your data. If we drill down a bit further on the storage front, we also introduced multiple storage tiers.
Ingested data hits the hot storage tier first and it’s immediately available for querying. There’s no need to wait for batching or other processing on leading-edge data. This enables queries to be 45x faster than previous versions of InfluxDB. The hot storage tier consists of the data that you’re actually using. This can include data retrieved from cold storage as well (more on that in a moment). Combining this hot storage approach with a 45x better data ingest rate and the ability to handle unlimited cardinality data, means that users can derive insights on large datasets in real-time without degrading database performance.
Clustered also improves the way it handles historical data. The cold storage tier consists of low-cost cloud object storage. InfluxDB moves historical data out of the hot tier to the cold tier for long-term storage. This historical data is always available and there are no additional fees for retrieving data from cold storage for current queries.
90% reduction in storage costs
Storage is a big concern when it comes to time series data. This is because sources produce massive volumes of time series data, especially at enterprise scale. Companies that want to get the maximum value from this data need to analyze it in real-time, but they also need to hold onto it so they can use it for historical or predictive analysis.
With InfluxDB Clustered, organizations don’t need to choose between storing data and value-driven data analysis. InfluxDB 3.0 reduces storage costs by 90% or more, allowing you to store more data using less space and at a fraction of the cost. One of the big factors in this reduction is the use of low-cost cloud object storage mentioned above.
There’s another key factor in the storage/cost equation though, which is data compression. There are two main components to data compression in InfluxDB 3.0. First is the shift to a columnar database. This allows the database to compress each column on an individual basis. Because the data in each column is often similar, the per-column compression can be significantly greater.
At the same time, InfluxDB uses Apache Parquet as its data persistence format. Parquet is a file format designed to work with columnar data structures, and it uses those structures to organize homogeneous data for better compression. It can use both dictionaries and run length encoding to efficiently compress and store repeated values.
So, the combination of cheaper object storage and more highly compressed data means that you can retain more data, using less space, for less money.
Enterprise-grade security and compliance
As always, InfluxDB encrypts data in transit by default. Users can expect to see the addition of enhanced security features in the very near future, too. These include things like private networking options, single sign on (SSO), audit logging, high availability, and attribute-based access control (ABAC).
Get started today
Going from InfluxDB Enterprise to InfluxDB Clustered is a gigantic leap forward. For a long time, users had to make difficult choices about their databases between performance, data retention, and costs.
InfluxDB Clustered (and the rest of the InfluxDB 3.0 products) virtually eliminates those challenges. It delivers real-time performance, on leading-edge (and historical) data, while lowering TCO. Not only does this mean that you can do more with your data, but, because you manage your own infrastructure with InfluxDB Clustered, you can make more cost-effective decisions that reduce initial startup costs and long-term maintenance and overhead needs.
We’re so excited about getting these capabilities into the hands of our users. To get started, request a proof of concept and our team of experts will contact you.