The 5Ws (and 1H) of the New InfluxDB Cloud
By Jason Myers / Feb 01, 2023 / InfluxDB Cloud, InfluxDB IOx, Community
Some things are inevitable, like Thanos, paying taxes, and change. While it would be nice to simply snap our fingers and deliver new products, things aren’t so simple in the real world. InfluxDB has been the leading time series database since January 2016. But we’re not content to rest on our laurels. The quest to improve InfluxDB is constant and ongoing. As of today, we’re beginning the rollout of an all-new and improved InfluxDB Cloud powered by IOx.
WHO: Anyone who works with time series data will benefit from the improvements we’ve made to InfluxDB Cloud. If you work with high cardinality time series data, like tracing data, then InfluxDB Cloud will be of even greater interest because the new engine enables unlimited cardinality use cases.
WHAT: In addition to support for unlimited cardinality, what else is new in InfluxDB Cloud? Well, we rebuilt the database’s storage engine from the ground up. InfluxDB has always been great at handling metrics, but now it has even better support for event data. Where tracing data could create performance issues with the previous storage engine, the new engine can handle the scale of tracing data easily.
We re-designed and optimized the new engine for low latency queries. The way we structured everything powers real-time analytics because incoming data is available in a “hot” in-memory cache.
We also changed the persistence format for storing data. You may not initially think that this is a benefit, but by using the Parquet format we drastically improved data compression. This saves costs because you can save more data and use less memory in low-cost object storage, like S3. The Parquet format also allows interoperability with a whole host of data ecosystems, expanding the reach and value of your time series data.
Getting back to the topic of queries, one of the biggest updates is that InfluxDB Cloud now supports native SQL queries. Yes, you can still use InfluxQL or Flux for queries you already wrote, but SQL provides a familiar query language that makes ramping up on InfluxDB and time series data faster and easier.
WHEN and WHERE: The first phase rollout of the new storage engine starts today. The new InfluxDB Cloud engine is available in two AWS regions – one in North America and one in Europe. All new accounts in these regions will automatically have the new storage engine. In the coming months, we will continue to make the new storage engine available in other AWS regions we currently support, as well as across the other cloud providers – Azure and GCP. Stay tuned for more details.
WHY: The goal here is pretty straightforward. We want to provide users with the best tools, performance, and experience when working with time series data. We had an opportunity to use cutting-edge technology to make InfluxDB better and that’s exactly what we did.
HOW: To understand how we accomplished all these updates and improvements, we need to look under the hood of InfluxDB. The new storage engine starts with the Rust language, which our developers used to write it. Our team also made extensive use of the Apache Arrow project, which is a language-agnostic software framework for developing data analytics applications that process columnar data. There are a few other projects within the Arrow ecosystem that further propelled this project forward. We’ve already mentioned Apache Parquet, used as the data persistence format. Another major component is Apache DataFusion, which provides the native SQL support for InfluxDB Cloud and Apache Flight SQL provides interoperability with other tools and systems.
The future of InfluxDB is bright and there are even bigger things to come. But for now, try out the new InfluxDB Cloud and see how it improves your Time to Awesome.