What is cardinality?
In the context of databases cardinality is the number of unique sets of data stored in a database. Specifically it refers to the total number of unique values possible within a table column or database equivalent.
For many time series databases, cardinality can become a problem if fields with unbounded values are chosen as a tag, like user IDs, email addresses, or container IDs. High cardinality can result in performance issues at scale due to how some time series databases index data.
Cardinality with InfluxDB
InfluxDB users found out that they have a cardinality problem in one of two ways:
- InfluxDB Cloud notifies them when they hit cardinality limits.
- They notice that reads and sometimes writes are getting slower and slower on InfluxDB Cloud or InfluxDB OSS 2.0.
When writing to InfluxDB, it uses the measurements and tags that you use to create indexes to speed up reads. However, when there are too many indexes created, both writes and reads can actually start to slow down. This required some extra planning when it came to tagging data for certain workloads to avoid issues related to cardinality that could impact performance.
However, with the release of InfluxDB’s new column-based storage engine, this is no longer an issue and InfluxDB can be used for workloads that require unbounded cardinality like observability and distributed tracing.