How to Choose the Right Database in 2023
By Charles Mahler / Apr 03, 2023 / Community
This article was originally published in The New Stack and is reposted here with permission.
Databases are often the biggest performance bottleneck in an application. They are also hard to migrate from once being used in production, so making the right choice for your application’s database is crucial.
A big part of making the right decision is knowing what your options are. The database landscape has been changing rapidly in the past few years, so this article will try to simplify things for you by going over the following topics:
- An overview of the database ecosystem in 2023
- What actually makes different types of databases perform differently from a technical perspective
- When to use a specialized database vs. a general-purpose database
The database landscape in 2023
Before diving into things, let’s look at a snapshot of the current database ecosystem and the market share of various types of databases:
As you can see, relational databases are still the most used type of database despite all the hype around NoSQL databases. However, if we look at recent trends, the ranking tells a slightly different story.
This chart shows that over the past two years relational databases have been losing a bit of ground to several different types of database models. The following are some of the main database models that are gaining adoption with developers:
- Document databases
- Graph databases
- Time-series databases
- Columnar databases
- In-memory databases
- Key-value databases
- Search engine databases
What makes databases perform differently?
When it comes down to database performance, there’s nothing magical that makes one perform better than another. Like all things computer science, it comes down to trade-offs that allow performance to be optimized for specific use cases. For databases specifically, CAP theorem is a good introduction to some of the possible trade-offs made to tune performance.
For example, in the early days of NoSQL databases, there was a lot of hype around their scalability, but the trade-off generally involved sacrificing data consistency guarantees provided by standard relational databases.
Some other design factors that will affect how a database performs:
- On-disk storage format — How a database actually stores and organizes data on hard drives has a major impact on performance. As more companies begin storing huge amounts of data intended for analytics workloads, storing data on disks in a column-based format like Parquet is gaining popularity.
- Primary index data structure — How a database indexes data will also have a major impact on performance. Databases generally have a primary index used by their storage engine and then allow users to define secondary indexes. The simplest way to think about indexing is that they will help improve read performance but add overhead to writing new data points.
- Data compression — How data is compressed will factor into how much it costs to store your data and the query performance of the database. Some compression algorithms are designed to reduce the size of your data as much as possible. Others might have a lower compression ratio but are faster when it comes to decompressing the data, which means that you get better query performance of your data.
- Hot and cold storage — Many database systems now allow for data to be moved between faster and more expensive “hot” storage, and cheaper but slower “cold” storage. In theory this allows for better performance for frequently queried data and for saving money on storage while still allowing the data in cold storage to be accessed rather than outright deleted.
- Durability/disaster recovery — How a database handles disaster recovery plays a role in performance as well. Designing a database to mitigate various failures will generally decrease performance, so for some use cases where data isn’t mission critical and occasionally losing data points is fine, databases can remove some safety guarantees to squeeze out better performance.
All of these factors, as well as many others that weren’t covered, play into the performance of a database. By twisting these levers, a database can be optimized for very specific performance characteristics, and sacrificing certain things won’t actually be a problem because they aren’t needed for a certain situation.
When to use a specialized database for your application
There are a number of factors that go into deciding which database to use for your app. Let’s take a look at some of the major things you need to consider when choosing the database for your application.
Data access patterns
The primary factor in choosing a database is how the data in your application will be created and used. The broadest way to start with is probably to determine whether your workload will be online analytical processing (OLAP) or online transaction processing (OLTP). OLAP workloads are analytics-focused and have different access patterns compared to the more standard OLTP workloads that relational databases were designed to handle. OLAP queries generally only hit a few columns to perform calculations and can be optimized by using a columnar database designed for this. As an example, most data warehouses are built on top of column-oriented databases due to the performance benefits.
Once you’ve broadly determined the type of workload, you now need to consider things like the latency requirements for queries and how frequently data will be written. If your use case needs near-real-time queries with low latency for tasks like monitoring, you might consider a time-series database that is designed to handle high write throughput while also allowing data to be queried soon after ingest.
For OLTP-style workloads, you’ll typically be deciding between a relational database or a document database. The key factor here will be looking at your data model and determining whether you want the schema flexibility provided by NoSQL document databases or if you would prefer the consistency guarantees provided by relational databases.
One final thing you could consider is whether you expect your workload to be fairly consistent throughout the day or if it will be “bursty” and require your database to occasionally handle far larger volumes of reads and writes. In this case, it would make sense to use a database that makes it easy to scale your hardware up and down so you aren’t facing downtime or high costs for hardware that isn’t required most of the time.
Your team’s existing skill set should be taken into consideration when deciding on what to use for your database. You need to determine whether the potential gains of using a specialized database are worth the investment in training your team to learn how to use it and the lost productivity while learning a new technology.
If you know the service you are building won’t need to be fully optimized for performance, it would be fine to use whatever database your team is most familiar with to get the job done. On the other hand, if you know performance is critical, it may be worth the growing pains of adopting a new database.
Keeping your software’s architecture as simple as possible is ideal, so adding another component to a system like a new database should be weighed against the additional complexity that managing the database will add to the system.
This isn’t as big of an issue if your application is such an ideal fit for a specialized database that it can act as the primary database for the application’s data. On the other hand, if you will be using a more general-purpose database as the primary storage for the app, bringing on an additional database for a subset of the data might not be worth it unless you are facing serious performance problems.
The database ecosystem is evolving rapidly. While going with a database you know is always a good option, it makes sense for developers to keep tabs on some of the new technologies that are being released and see if they are a good option for what you are building. Building on a specialized database can help your application succeed in a number of ways by saving you money on costs, improving performance for users, making it easier to scale and improving developer productivity.