The Best NoSQL Database for IoT
Important characteristics of IoT data
Everything is being instrumented today. From cars to cities, fridges to smart-watches, more things are being connected to the Internet of Things. These IoT-connected devices produce a massive amount of data, and deriving insights from that data requires a database equipped to handle it. Which is the best database for IoT?
The most central characteristic of IoT data is that it is Big Data. Even in a modest deployment — 100 servers with 100 values measured across 9 subsystems every 10 seconds — we end up with 86 million values per day. Many IoT applications use much more data than this.
The other important characteristic of IoT data is that while it's constantly being inserted by IoT devices and being read by analytics tools, it's not often updated after it's written. It's also not often deleted, outside of implementing a data retention policy.
In order to handle IoT data effectively, we'll need a database that can handle a lot of time series data with fast read/write performance. Furthermore, since the majority of IoT use cases require high availability and scalability, these will become deciding factors.
Various flavors of SQL — MySQL, SQL Server, etc. — are many peoples' first choice for databases. But these traditional databases don't tend to work well for IoT use cases because the relational database model wasn't designed with time series in mind.
Here's one example of a way that SQL is suboptimal for IoT. Say we have the dataset described above, with 86M values written each day. It wouldn't be feasible to create a single table indexed by time (as the primary key or part of a composite key), because we would need to create many different lookup indices, and the table becomes absurdly large when running for any significant period of time on such high volumes of data.
One way to try to improve this system is by creating a separate table per day (or other period of time). This way, each individual table doesn't grow to an inoperable size. However, this requires that back-end developers write application code to tie the tables together, then write even more code to allow you to use summary statistics and create retention policies.
None of these solutions account for the scalability problem, either. In an IoT environment, it's very easy to scale past what a single SQL server can handle. To fix this, many people turn from SQL to NoSQL.
Being non-relational, they don't have the scaling problems of SQL, but popular NoSQL databases still have issues when it comes to working with IoT data.
Let's use Cassandra as an example. Just deciding on a structure for row keys to properly utilize your cluster is a non-trivial task. On top of that, it's still necessary to write application code to do query processing and downsampling, among other necessary tasks. Even after that's finished, ongoing optimization is necessary to maintain peak query performance.
This process can take months and require many experienced back-end engineers. Further, the popular NoSQL databases used for IoT — MongoDB, Cassandra, etc. — are not the most effective solutions even after configuration and optimization.
The best NoSQL database for IoT
Because IoT data is, after all, time series data, the optimal solution would be a purpose-built time series database. Ideally, it would come out of the box equipped to work with IoT data, using a time series optimized data store and compression method. It would be able to downsample and aggregate data over time easily, and it would have lightning-fast read and write times.
Fortunately for IoT application developers, this database already exists. InfluxDB is an open source time series database used by IoT companies like tadoº, Spiio, and Bboxx, among others. The setup takes minutes, not months, and requires very little development effort.