What is a NoSQL database?
NoSQL databases are a type of database system that provide a mechanism for data storage and retrieval, differing from traditional relational databases which are structured and require data to fit into predefined tables.
NoSQL databases emerged to address the scalability and flexibility issues present in relational databases, particularly in the context of large amounts of data and multi user applications. They are specifically designed to handle unstructured data, and can be scaled across many servers easily.
While NoSQL databases provide solutions for scalability and flexibility, they are not a replacement for SQL databases but rather a complement, suited to particular types of projects. The choice between SQL and NoSQL databases depends on the specific data requirements of the application or project in question.
NoSQL database use cases
Due to being schemaless, NoSQL databases are a good fit for CMS applications because they are flexible and easy to modify to fit new data formats.
Real time analytics and big data
NoSQL databases are a good fit for big data, because the majority of NoSQL databases are design to scale horizontally out of the box. NoSQL databases are also a good choice for real time analytics use cases because their design allows them to handle higher write throughputs compared to relational databases.
IoT devices generate a massive amount of diverse, unstructured data. NoSQL databases can efficiently handle this kind of data and their distributed nature suits IoT’s geographical distribution, with shards being able to be located regionally to improve performance.
Social networks have complex relationships that are best modeled with graph-based NoSQL databases. LinkedIn and Facebook both used graph databases heavily to power their applications.
NoSQL databases benefits
One of the primary benefits of using a NoSQL database is that almost all of them will provide easy to use horizontal scaling features by default. This makes it easy to scale as your application grows compared to most relational databases which tend to rely on vertical scaling.
There are a variety of specialized NoSQL databases which are optimized for specific use cases, so you can choose which one will give you the best performance. In general NoSQL database also excel at handling higher write volume compared to relational databases.
Because of their flexible data models it is easier to iterate and make changes to applications compared to a relational database. Specialized NoSQL databases like graph or time series databases also tend to provide built-in features so you don’t have to reinvent the wheel with custom code for common tasks associated with certain types of data.
NoSQL database challenges
NoSQL databases offer a number of benefits, but that doesn’t mean they are magical. There are several potential downsides that you should keep in mind when choosing a database for your application.
Data model complexity
While the flexible data model of NoSQL databases is often a selling point, it can also become a major issue if an engineering team isn’t disciplined. It can result in issues coordinating on how the data model should look and where data should be located. It can also result in conflicting data or duplicate data.
Complex transaction support
For some applications where data reliability for advanced transaction is needed, NoSQL might not make sense. Most NoSQL databases will provide some form of simple transaction support, but not ACID transactions. And often for NoSQL DBs that do offer ACID transactions, enabling them will result in losing promised performance gains.
While popular relational databases like MySQL and Postgres are battle tested due to having been around for decades, many NoSQL projects aren’t as mature. This can result in obscure bugs, less community support, and the tool ecosystem being less robust.
Database performance follows CAP theorem, which means you can only have 2 out of 3 when it comes to Consistency, Availability, or Partition tolerance. Most NoSQL databases sacrifice consistency to get better performance in other areas, which means you should only use this type of database for use cases where eventual consistency is acceptable.
Types of NoSQL database
Key-value databases are the simplest form of NoSQL database, simply having a value which is tied to a unique key to identify the value. Key-value databases are generally used for tasks where quick lookups on simple data is needed like caching, session management, and user information.
Document databases are similar to key-value databases but provide JSON-like structure to the data being stored and more advanced querying capabilities. MongoDB and CouchDB are examples of document databases.
Columnar databases are used for OLAP and other analytics workloads. As the name suggests the data is formatted in columns on disks rather than rows like traditional relational databases. This results in better data compression and faster aggregations. Redshift and Clickhouse are examples of columnar databases.
Graph databases represent data as nodes and edges to maintain relationships between data points. They often store data so that these relationships are maintained even on disk, so less RAM is needed due to not needing to keep the entire dataset in memory. Queries for connected data points are far more efficient and easier to create compared to joining multiple tables in a SQL database. Graph databases are typically used for social networks and recommendation engines. Neo4J is an example of a graph database.
Time series database
Time series databases are optimized for storing time series data like application metrics, IoT sensor data, or financial data. They specialize at handling high volumes of write data, indexing data quickly so it can be queried soon after ingestion, and for efficient queries on time ranges. An example of a TSDB is InfluxDB.
In-memory databases store all data in RAM for fast queries and writes. Being stored in memory also allows them to provide unique data structures that aren’t possible with disk-based databases. The downside is that RAM is expensive for larger datasets. Memcached and Redis are examples of in-memory databases.
Search engine database
Search engine databases are designed for complex search queries and full text searches. They are able to index multiple types of data and provide different ranking and scoring algorithms to fine-tune how results are returned. Elasticsearch is an example of a search engine database.
Vector databases are a newer type of database that are used to store and query vector data for AI applications. Common use cases are for similarity search, image recognition, and generative AI. An example is Milvus.
Wide column database
These can be considered a subset of columnar databases, these databases store data in column families which allow them to be more versatile than purely column-oriented databases. Cassandra and HBase are examples.
Many NoSQL databases are technically multi-modal, meaning they support different types of data models and indexes. Over time many specialized databases add more functionality and are able to support more use cases as a single database.
NoSQL vs. SQL: What’s the difference?
Relational databases and NoSQL have a few key differences, with tradeoffs made by both types of databases. Which is better will depend on your specific use case.
- Data model - SQL databases use a relational model which organizes data into a table format with rows and columns. NoSQL databases have a large number of different data models to structure data.
- Scalability - Relational databases tend to scale vertically, meaning the hardware size of the server the database is running on is increased. This can limit scalability long term. NoSQL databases tend to be designed for horizontal scaling across commodity hardware.
- Data integrity - SQL databases support ACID transactions which guarantee data integrity at the cost of performance. NoSQL databases typically use BASE, which involves loosening consistency guarantees in return for performance.
- Query language - Relational databases use SQL as a query language, which is powerful but can become complex for certain types of queries. NoSQL databases tend to provide query languages that are simpler and easier to use for non database experts.
What are LSM trees and why do NoSQL databases use them?
LSM Trees, or Log-Structured Merge-Trees, are a type of data structure often used in NoSQL databases to provide efficient, write-heavy operations. They were designed with the goal of reducing the cost of random writes to disk, which are more time consuming than sequential writes or reads.
An LSM Tree consists of two or more components: a memory component, often called a memtable, and one or more disk components. When a write operation occurs, the data is first written to the memtable. This write is typically fast because it’s performed in memory.
The LSM Tree design is used in several NoSQL databases, such as Cassandra and LevelDB, because it provides high write throughput, efficient storage space utilization, and tunable read performance. It’s especially useful in write intensive applications or situations where large amounts of data need to be written to disk quickly and frequently.