The Advantage of Cold Storage in InfluxDB

Navigate to:


Imagine, if you will, having hundreds of devices that you need to monitor. All these devices generate data at sub-second intervals, and you need all that high fidelity data for historical analysis to feed machine learning models. Storing all that data can get really expensive, really fast. When that happens, you must decide what’s more important: keeping all your data or sacrificing insights and analysis. It may not be a big stretch of the imagination for many readers.

Multiple storage tiers

One of the critical architectural changes in InfluxDB 3.0 is the separation of storage and compute. Within the storage layer, things are further divided into hot and cold storage. The hot storage tier is in memory and is where fresh, new data hits first. Frequently queried data also lives in the hot storage tier. This is one of the crucial elements that enables InfluxDB 3.0 to provide real-time analysis of incoming data.

Of course, keeping vast amounts of time series data in memory is cost prohibitive for pretty much any user. Once data ages out of the hot tier it needs somewhere to go. It’s at that point that data enters the cold storage tier.

Data compression

InfluxDB does several things to optimize the cold storage tier. The first thing is to compress the data as much as possible. In InfluxDB, compression occurs at in two different places. One is the database itself, which uses Apache Arrow and a columnar layout to improve compression. The other is the Apache Parquet file format, which can take that Arrow data and compress it even further. For more of the details on this, check out this blog.

Object store

All that highly compressed data needs a final resting place, and InfluxDB uses low-cost cloud object storage (e.g., Amazon S3) for that purpose. Object storage is way cheaper than SSD and in-memory options. So, saving compressed data on object storage can generate major cost savings to the tune of 90%+ in some cases. This also means that you can store more data for less cost, so you no longer need to choose between saving all your data and having thorough data analysis. InfluxDB gives you both.

Try InfluxDB for yourself and see what a difference it makes.