The Best NoSQL Database for IoT

Important characteristics of IoT data

Everything is being instrumented today. From cars to cities, fridges to smart-watches, more things are being connected to the Internet of Things. These IoT-connected devices produce a massive amount of data, and deriving insights from that data requires a database equipped to handle it. Which is the best database for IoT?

The most central characteristic of IoT data is that it is Big Data. Even in a modest deployment — 100 servers with 100 values measured across 9 subsystems every 10 seconds — we end up with 86 million values per day. Many IoT applications use much more data than this.

The other important characteristic of IoT data is that while it’s constantly being inserted by IoT devices and being read by analytics tools, it’s not often updated after it’s written. It’s also not often deleted, outside of implementing a data retention policy.

In order to handle IoT data effectively, we’ll need a database that can handle a lot of time series data with fast read/write performance. Furthermore, since the majority of IoT use cases require high availability and scalability, these will become deciding factors.

Using SQL

Various flavors of SQL — MySQL, SQL Server, etc. — are many peoples’ first choice for databases. But these traditional databases don’t tend to work well for IoT use cases because the relational database model wasn’t designed with time series in mind.

Here’s one example of a way that SQL is suboptimal for IoT. Say we have the dataset described above, with 86M values written each day. It wouldn’t be feasible to create a single table indexed by time (as the primary key or part of a composite key), because we would need to create many different lookup indices, and the table becomes absurdly large when running for any significant period of time on such high volumes of data.

One way to try to improve this system is by creating a separate table per day (or other period of time). This way, each individual table doesn’t grow to an inoperable size. However, this requires that back-end developers write application code to tie the tables together, then write even more code to allow you to use summary statistics and create retention policies.

None of these solutions account for the scalability problem, either. In an IoT environment, it’s very easy to scale past what a single SQL server can handle. To fix this, many people turn from SQL to NoSQL.

Using NoSQL

Being non-relational, they don’t have the scaling problems of SQL, but popular NoSQL databases still have issues when it comes to working with IoT data.

Let’s use Cassandra as an example. Just deciding on a structure for row keys to properly utilize your cluster is a non-trivial task. On top of that, it’s still necessary to write application code to do query processing and downsampling, among other necessary tasks. Even after that’s finished, ongoing optimization is necessary to maintain peak query performance.

This process can take months and require many experienced back-end engineers. Further, the popular NoSQL databases used for IoT — MongoDB, Cassandra, etc. — are not the most effective solutions even after configuration and optimization.