A Guide to In-Memory Databases
In-Memory Databases Explained
Among the many database options available to developers, the in-memory database is one rising choice for real-time workloads that require high speed, reliability, and low latency. In this article you will learn about in-memory databases, how they work, some common use cases, and some popular in-memory databases that you can use in your own projects.
What is an in-memory database?
An in-memory database is a type of database that holds all data in memory rather than using hard drives like traditional databases. Random Access Memory (RAM) is significantly faster than disk and the result is that an in-memory database can read and write data much faster. This speed comes with some tradeoffs which will be discussed later in the article. Depending on your use case and architecture, these downsides can be eliminated altogether, and you will only see benefits from adding an in-memory database to your application.
Why use an in-memory database?
An in-memory database is for situations where users need near real-time access to data. In short, the reason a developer would choose to use an in-memory database for their application is primarily for performance reasons. Additionally, most in-memory databases provide a number of useful data structures and other abstractions to make working with your data easy as well as performant.
Some use case examples include real-time fraud detection, risk management, as well as certain kinds of data analysis and forecasting. We’ll talk about some use cases in more detail going through the ways that companies use this type of database technology.
Features of an in-memory database
Beyond providing raw performance, in-memory database projects offer some other features to stand out from competing projects. Below are some of the most common features.
Most in-memory databases will provide some way to persist data if hardware fails. These built-in solutions make it easier to recover from outages but have their tradeoffs. The 2 primary strategies for persistence are snapshots and append-only files.
Snapshots are copies of the data in your in-memory database at a moment in time. They are typically generated at some fixed interval like every hour or every 5 minutes. If hardware fails you can simply restart your database and load the snapshot into memory. The downside of snapshots is that you will lose any data added to the database between snapshots, so if you are taking snapshots every hour and the DB fails 1 minute before the next snapshot, you will lose the last 59 minutes of data.
Append-only files work by logging every write operation that the database receives. This log can then be used to reconstruct the database if it needs to restart. This is the safest persistence method but comes with a slight performance hit because data is being logged for every write. This is still much faster than using a traditional database because by only appending to disk, the database avoids expensive seek operations.
Advanced data structures and data types
In-memory databases will provide various data structures for storing your data with different performance characteristics and developer productivity benefits. Redis for example provides basic string and integer data types while also providing support for Kafka-like streams and geospatial indexing functionality.
Relational or NoSQL data model support
There are a number of in-memory database options available with support for both NoSQL or relational data models. So no matter what your current architecture looks like, you can find an in-memory database to support it.
Another useful feature that many in-memory databases will support is some kind of high availability mode. This generally involves support for clusters of nodes which can be scaled up and down and distributed geographically across data centers. Additional features might include automatic failover, replication, and cost optimization.
Hybrid memory architectures
A more advanced feature of some in-memory databases is the ability to intelligently spread some of your data to flash storage while maintaining performance. By keeping some of your data out of more expensive RAM, this hybrid architecture is able to get the same performance for up to 5x less cost.
In-memory database use cases
Read heavy applications
One of the primary use cases and the most simple is to use an in-memory database to cache data. This takes traffic load off of application servers and also reduces latency because the cache can be deployed to the edge and in many cases won’t need to be updated frequently.
Caching is ideal for things like websites or social media applications where once something like a post or comment is made, it is rarely updated. Think about a social media post by a large account that gets millions of views but is only written once and maybe updated a few times at most. Any application with a high read-to-write ratio would be a good fit for an in-memory database and caching.
Deploying machine learning solutions is challenging by itself, even harder is making machine learning and deep learning models perform well without impacting user experience. One way in-memory databases can help is by running models at the edge to reduce latency and save battery if the user is using a mobile device. Many in-memory databases provide extensions that can run models created by common machine learning frameworks like TensorFlow or PyTorch. In-memory databases can also be used to deploy new models with no downtime.
Fraud detection relies on having access to large amounts of historical data to find trends and then being able to make predictions with both speed and accuracy. It’s a fine balancing act between being accurate by doing more analysis but also not hurting user experience by slowing down the checkout process. An in-memory database can help solve this problem because it can hold a huge amount of data in-memory so it can be queried and processed quickly.
Advantages of an in-memory database
Better performance through specialization
One reason in-memory databases are able to squeeze out so much performance is due to not having to make tradeoffs like traditional disk-based databases do. An in-memory database will be faster for queries even compared to a normal database that is also storing data in RAM, which might be strange because in theory both databases are storing that data in memory.
For example, B-trees are generally used as the data structure for indexing in relational databases because it maps well to file structure and helps negate potential worst-case scenario queries that involve random access. In comparison, an in-memory database doesn’t have to worry about disk seeks, so they can use alternatives to B-trees that provide better performance. By not having to worry about all sorts of edge cases and being explicit about what they don’t support, in-memory databases have room for optimizations that general purpose databases simply can’t make.
Another example would be the fact that disk I/O simply isn’t an issue for in-memory databases compared to standard databases, and as a result, they don’t have to deal with the performance overhead of encoding data before writing to disk.
Specialized algorithms available
Another side effect of disk no longer being a limiting factor is that a variety of algorithms can now be used for querying and analyzing data that wouldn’t be possible on a traditional database due to performance considerations. This opens up all sorts of new use cases and possibilities for how to use your data and thus create new features for your applications.
Disadvantages of an in-memory database
RAM is more expensive
The most obvious disadvantage of an in-memory database is that RAM is still more expensive than disk, although the price has dropped significantly in recent years. As a result, developers will need to determine what data is valuable enough to store in memory and what should be placed in cheaper storage.
More complex architecture
Most applications are going to require some permanent form of storage, so odds are that you will still need to operate your primary database in addition to your in-memory database. This adds operational burden and complexity to your app. Depending on where your business is, this may or may not be worth it.
Examples of in-memory databases
Redis is the most popular in-memory database available, with a huge community around it resulting in strong integrations with a number of tools. Redis is open source and also provides a cloud and enterprise product with a variety of advanced features added to the open source version. It is a strong option due to its performance, number of features, and community.
Aerospike is an in-memory database with support for hybrid memory models. Aerospike has also added features that go way beyond most in-memory databases by providing support for graph algorithms and document-style structured data.
MySQL Memory storage engine
InfluxDB is a dedicated time series database that borrows many design concepts from in-memory databases to improve write and query performance for time series data specifically. InfluxDB can be configured to use different amounts of memory, write to flash memory for hybrid storage architectures, and writes data to disk using an append-only log to get performance similar to in-memory databases.
In-memory database FAQ
Does an in-memory database hold data forever?
Without a hard disk or storage media, an in-memory database will not hold data after a hardware failure or the database being restarted. That makes it different from the traditional relational database where data is stored on disk.
In-memory database vs caching
Caching can be seen as one example of what an in-memory database is capable of doing. Caching is also typically implemented by simply keeping the most recently accessed data stored, where an in-memory database will use more advanced algorithms to determine which data is worth keeping in memory.
What is the alternative to an in-memory database?
An alternative to an in-memory database would be configuring a standard database to give it a large amount of RAM to use as a cache. However, as discussed above, the performance still won’t be as good as a dedicated in-memory database and you probably won’t have access to some of the other features that make life easier for developers.
Available as InfluxDB Open Source, InfluxDB Cloud & InfluxDB Enterprise