A Guide to Key-Value Databases
How It Works, Key Features, Advantages
The key-value database is one of several significant innovations in database design over the past couple of decades. Like many newer database designs, it replaces some traditional design aspects with new approaches that resonate with developers.
By revolutionizing the ways in which data is stored and queried, key-value databases allow businesses to do more things with the data at their disposal.
What is a key-value database?
A key-value database (also known as a key-value store) is a type of noSQL database. Unlike prior relational databases that stored data in defined tables and columns, a key-value database instead uses individual or combinations of keys to retrieve associated values. Together they are known as key-value pairs. These values can be anything from simple data types like strings or integers to complex objects with multiple nested values. Each key is a unique identifier that maps to a value or a location of data being stored.
People describing key-value database design sometimes explain that this new type of database is like object-oriented programming applied to database systems. Some of the same ideas about labeling objects in their properties in OOP also apply to the key-value database and the way it is built.
How do key-value databases work?
Under the hood, key-value databases work by keeping an in-memory data structure that is mapped to the data stored on disk. RAM is much faster than accessing data from disk so most databases will have some sort of algorithm to keep frequently accessed data in RAM and only fallback to disk if the index isn’t already stored in memory.
Some characterize key-value databases as the ‘simplest’ type of noSQL database. It can be useful for scalable data setups and other enterprise uses that need flexibility. By allowing a freer orientation of data, the key-value database offers a way to store data for more flexible retrieval, in a way that accommodates schema changes that are often needed as application needs change during development.
Key-value databases can differ significantly in their implementation depending on the intended use case. Some differences between databases can include:
- consistency model
- whether keys can be sorted
- serialization strategies
Features of a key-value database
The most basic features that are required for any key-value database are the ability to create, update, retrieve, and delete data using keys. But many of the most popular databases provide features beyond the basics to make developers more productive. Below are some of the most common features provided by popular key-value databases.
Data type support
Many key-value databases provide support for defined data types and semi-structured data. This can be things like arrays or nested dictionaries. By giving the database more information about your data, there is room for more optimization in terms of storage and query performance.
The simplest implementations of key-value databases have keys which can be accessed directly. One common feature that is useful is to have keys sorted in some way so that the keys can be efficiently iterated over. Some common use cases for this feature:
- Grabbing all keys that start with a certain letter
- Grabbing all keys within a range of numbers
- Grabbing all keys less than or greater than a certain number
- Grabbing keys within a certain period of time if the key is a timestamp
Secondary key/index support
Some key-value databases allow you to define multiple different keys to access the same information. For example if you are storing user data you might want to be able to find that information by using either the name, email address, or phone number. Secondary key support makes all of these options possible, rather than being forced to choose a single key.
Replication and partitioning
Many key-value databases provide support for advanced scaling features out of the box. Replication means you can have multiple nodes with copies of the same data. This not only helps with scalability but also disaster recovery; if one node goes down, you still have your data.
Partitioning is how your data is broken up across nodes. Many databases provide a default way of doing this but also give you the option of defining exactly how you want your data to be partitioned. A simple example of this would be using the first letter of each key as the partition, which would result in 26 partitions, one for each letter in the alphabet.
More advanced key-value databases will have automatic support for distributing your database across multiple data centers. This makes your application more reliable as well as improving performance because you can respond to queries close to users around the world by using local data centers.
While much of the performance gains possible by NoSQL databases are due to dropping support for things like ACID (Atomicity, consistency, isolation, durability), many key-value databases will have the option to use ACID transactions when required, at the cost of some performance loss. Simply having the option is a huge benefit for developers because they can use it when they need it but still get great performance for situations where it isn’t necessary.
Advantages of key-value databases
Now that you are familiar with some of the general features of key-value databases, it’s time to look at the specific advantages provided by them and why developers choose to use them.
The primary selling point of key-value databases and NoSQL databases in general is the scalability they provide compared to relational databases. These DBs became popular after big tech companies like Amazon and Google wrote about the databases they built internally to handle scaling problems.
Databases typically become the primary bottleneck for software and many developers had felt the pain of trying to implement replication, sharding, and other strategies used for scaling out relational databases. Being able to abstract that away and focus on writing code that drove business value appealed to many tech companies and is why usage of key-value databases has grown so fast.
A secondary benefit of key-value databases is developer productivity. A big part of that was mentioned above already — not having to spend so much time trying to scale the database lets developers focus on other things.
Additionally, the schema-less nature of key-value and NoSQL databases made it much easier to iterate while writing code. Making changes to schema with a relational database requires migrations and potential downtime, which doesn’t happen with a key-value database.
There’s also the idea of “impedance mismatch” which relates to the mental model of how your data is manipulated in your code compared to how it is stored in a relational database. Trying to map the objects you are using in your code to a bunch of different tables in your relational database doesn’t feel natural in many cases. Key-value databases mostly eliminate this problem and make it easier for software engineers to work with their data.
Even ignoring the scalability features provided by key-value databases, for various use cases a single node key-value database has performance benefits in terms of reading and writing data compared to a more general-purpose relational database.
Disadvantages of key-value databases
When it comes to computer science, there is no perfect solution to all problems. While we’ve covered some of the advantages of key-value databases, there are always going to be tradeoffs. In this section, you will learn about some of the downsides of using a key-value DB and how to determine whether using a key-value database makes sense depending on your use case. Many of these downsides are simply tradeoffs of the advantages mentioned above and won’t be an issue if you use the right tool for the job.
It should also be noted that many of these “disadvantages” have solutions provided by modern key-value databases or multi-model databases built on top of key-value stores. But it’s useful to know some of the potential pitfalls regardless.
Lack of ACID support
Many key-value databases don’t provide support for ACID to improve their scalability. In the early days of NoSQL adoption, many developers would try to compensate for this by essentially replicating transactions in their application code which led to many problems.
While being schema-less can improve developer productivity in the short term, if an engineering team isn’t disciplined enough, their data model can become a mess if they don’t plan properly. Being able to change schema on the fly can cover for poor planning and lead to long-term problems. In some ways, being forced to map out your data model with a relational database can be seen as a benefit.
Advanced query support not available
A standard key-value database implementation doesn’t provide any sort of insight into what the actual value contains — when you grab the value using the key you have no guarantee of what you are getting. This means that you will have to filter or process data you don’t need in your application code. This will generally be less efficient in terms of performance compared to doing most of that work in the database.
The lack of query language also means that logic that would normally be kept in the database is now in your application code, which can lead to complexity and make maintaining code more difficult.
Updating values can also be inefficient because the entire chunk of data has to be replaced, even if you just want to update a single field in a nested data structure.
Potentially less efficient storage and query optimization
With defined schema types, a relational database is able to optimize storage by using compression in some cases and can also optimize common queries like getting aggregates of column values.
Key-value database use cases
In this section, you’ll learn about some common use cases for key-value DBs. This can range from being used as the primary database for an entire application to being used for only a few niche use cases within an app.
One common design pattern that works with many apps is to use a key-value database like Redis to improve the read performance of an application. A relational database can act as the source of truth where data is written and that data is then pushed out to a number of geographically distributed key-value database nodes. This results in reduced latency because the data is closer to users and also makes an app more scalable and reliable.
Key-value databases can also be used for storing pre-computed data that is crucial to user experience. An example of this is Twitter generating users’ news feeds ahead of time and caching them so users get a faster homepage load.
Storage engine for higher-level databases
Many databases use key-value DBs under the hood as a storage engine due to their raw performance and to save development time by not reinventing the wheel. RocksDB is an open-source embedded key-value database created by Facebook that has been used by or is supported by MySQL, Cassandra, MariaDB, MongoDB, YugabyteDB, and InfluxDB.
Internet of Things
Many businesses in many different industries are using sensors and related technology to collect more data about operations. It might be related to manufacturing and product development, or the use of a service model to retrieve data for customers. Companies might be collecting data about supplier and vending contracts, and how those operations work.
A corresponding benefit applies to new communications models like the Internet of Things, where a greater volume of devices are used to move data through a business network. In the IoT, data is “always in transit” – filtered through a greater number of hardware hops, with all of the logistical issues that may entail.
In response, modern engineering has conceived of ways to do processing closer to the origin of individual data points. Experts often promote the idea of computing “close to the edge” with a noSQL database – in the environment that stores data where the devices are collecting information. Key-value databases complement this kind of data operation. Because of their flexibility, they allow for better and more capable handling of this volatile activity as well.
The use of a key-value database in general and an Influx time series model, in particular, can merge with other strategies to build business efficiencies. For example, the better use of time-stamped data can dovetail with data visualization that provides business insights.
Another way to achieve more with these types of noSQL database setups is to tie them to dynamic resources in vendor service models. Serverless function is one prime example of how this works. By utilizing AWS Lambda or some serverless function service, the business user can complement the robust database systems around the time-stamped data in a way that doesn’t waste computing power.
Key value database examples
In this section, you will learn about a few popular examples of key-value databases in the real world as well as some of the precursor databases that led to the current popularity of NoSQL and key-value databases.
Berkeley DB is one of the first key-value database implementations. It was created at the University of California, Berkeley in 1991 for their BSD operating system and a replacement for the proprietary DBM equivalent written by AT&T for Unix. What makes Berkeley DB unique is that it is an embedded key-value store, meaning that by default it didn’t provide network access and was meant to be embedded within an app. Many of the architecture decisions for the sake of simplicity and performance benefits can be considered a precursor to NoSQL.
Berkeley DB inspired similar embedded key-value databases created by Facebook and Google called RocksDB and LevelDB.
Dynamo was a very influential paper published by Amazon about their internal key-value database which was used to scale their Amazon Marketplace. While many of the concepts used by Dynamo had been around for decades, Amazon brought them into the mainstream and proved that there was commercial value in using NoSQL-type databases.
Redis is a fully in-memory key-value database. This means that all data is stored in RAM rather than on disk, which drastically improves performance of reads and writes because RAM is generally 50x faster for sequential data reads and up to 100,000x faster for random access data. The downside is that holding data in RAM is significantly more expensive than storing data on a hard drive. Redis is typically used alongside another database as a cache to handle read requests.
InfluxDB as a key-value store
As a cutting-edge time series database, InfluxDB borrows some ideas from key-value database designs. Earlier versions of InfluxDB actually provided support for RocksDB as a storage engine.
InfluxDB maintains what’s called a time series database. In other words, the database is optimized for the use of time-stamped data. This empowers businesses in many ways. Queries can uncover information about the timeline of the time-stamped data, and figure out a lot more about the context of how something was added to the database. Time series databases provide a number of performance benefits for working with time series data compared to relational databases or key-value databases because they’ve been optimized for this type of workload and the unique queries required to gain valuable insights from time series data.
Key-value database FAQ
Key-value database definition
A key-value store or key-value database is a database that maps keys to values which can be any type of data. Key value databases can be used for storing collections of data which would not fit well together in a standard relational database.
When to use a key-value database?
Key-value databases can be used in many different situations. They can be used for inventory and product tracking, customer relationship management, and more. Developers can use them to scale and to implement different types of analysis applications. In short, key-value stores are good for any situation that requires flexible data models and scalable storage.
What’s the difference between key value and document databases?
Document databases are another type of noSQL design. Technically speaking, a document database is an extension of the key-value database model. Rather than just storing data based on a key that maps to a value, document databases store structured data. These documents can contain multiple defined values that can be indexed to speed up queries and grab related data. Document databases can be seen as somewhat of a middle ground between key-value databases and relational databases, where you have the option of creating a semi-structured schema if you choose.
How should companies view key-value database implementation?
Using key-value stores and other noSQL methods can help a company migrate a legacy system into something new that will better serve the needs of the business in the years to come. Legacy migration is often done to move data into cloud native systems that are more capable, less costly and easier to maintain. In the specific case of a key-value database, a company might move a subset of their data from a relational database to a key-value store or use a key-value store like Redis as a cache to serve data directly and only query the database for data not already in the cache.
Why did companies originally use relational database design?
The relational database design was originally created to solve the problem of data redundancy. It is a system that stores data in tables, where each table contains rows and columns. The rows represent records or instances of a particular type of information, and the columns represent attributes or fields.
In the early days of the database, cloud and big data services had not yet developed. Early databases were relatively simple and straightforward and did not account for the dynamic use patterns that businesses can adopt today.
Key-value database vs cache
Most key-value databases will have their own cache by default and will automatically keep frequently requested data in RAM, but an additional caching layer is always an option as well.
What is a distributed key-value database?
A distributed key-value database is a type of database that stores data in the form of key-value pairs with the data stored on multiple nodes, which are connected to each other via a network. The nodes are not required to be in the same location and can be spread across different geographical locations. This makes it possible for the database to be highly available and scalable.
The distributed database is accessed through a client application. The client application sends commands to the database server which then executes the commands in parallel with other nodes in the network. The distributed database is viewed as a single entity to the end user and they do not notice any performance degradation because of redundancy, and instead only see latency benefits.