Selecting a time series database
Time series databases have seen massive popularity increases over the past three years, a trend which shows no signs of stopping. In an increasing number of industries, time series data — where data points are indexed primarily by timestamp — is being produced in the course of DevOps monitoring, infrastructure and application monitoring, sensor data collection, and real-time analytics. In order to utilize all this data, it’s critical to have an optimal database to store time series data.
Which time series database is the best for you depends on your use case, but we can break this down by four main factors:
- How much data there will be at any given time (data volume)
- What types of queries you’ll be running on that data
- What kind of data you’re storing
- The rates at which you’ll need to read and write data
If your data volume is fairly low (less than a million total data points stored at any given time, with less than a thousand points being written per second), a standard relational database like MySQL, SQL Server, or PostgreSQL will work just fine. Though they aren’t designed to be used with time series data, the performance problems that result from this fact aren’t painful at a small scale.
For any data volume higher than a million data points, a purpose-built time series database will be best. Because their data stores are optimized for time series data, these databases are able to produce faster query times on large quantities of time-stamped data.
That being said, nearly all TSDBs are NoSQL databases. The optimization in these databases for time series data often comes at the expense of removing features common in SQL databases, such as ad-hoc querying. As such, if you’ll be running a lot of ad-hoc queries, especially on a small- or medium-sized dataset, it might be better to stick with SQL, or use MongoDB.
If you’re storing a large quantity of time series data and will be mainly running the same set of queries over it, the next consideration is what type of data you’re storing. Different time series databases are optimized for different data types. If you’re going to be storing data from an Internet of Things (IoT) setup, a key-value store like InfluxDB that’s able to contain standard data types like integers, floating-point numbers, strings, and booleans will work well. If you need to store more complex data types, a database like Google Bigtable, which treats all data as raw byte strings, might be for you.
Read and write times
The final consideration is how much data you’ll need to write to and read from your database. Even if you’ve considered the other three factors perfectly, choosing a high-performance database for time series data is critical. With a high ingest rate, real-time data analysis can be done even on Big Data, with rows numbering in hundreds of millions.
InfluxDB has a data ingest rate and a query speed higher than MongoDB, Cassandra, OpenTSDB, and many other general-purpose and time series databases. In addition, it has lower disk space requirements, due to its optimized time series data store.
The above-mentioned performance advantages of InfluxDB over other time series databases have led to it being consistently ranked as the most popular time series database by DB-Engines for over four years and to being widely adopted by almost every industry across a variety of use cases in both its open source and paid editions.