What is cardinality?
In the context of databases, cardinality is the number of unique sets of data stored in a database. Specifically, it refers to the total number of unique values possible within a table column or database equivalent.
For many time series databases, cardinality can become a problem if fields with unbounded values are chosen as a tag. This includes data such as user IDs, email addresses, IP addresses, tracing span IDs, container IDs, and more. High cardinality can result in performance issues at scale due to how some time series databases index data.
Cardinality in InfluxDB Cloud powered by IOx
With the release of InfluxDB’s column-based storage engine, InfluxDB can handle time series data and workloads that contain unbounded cardinality. This effectively eliminates the cardinality issue and facilitates use cases like observability and distributed tracing that require high cardinality data.
Cardinality with InfluxDB TSM
InfluxDB users typically discover that they have a cardinality problem in one of two ways:
- InfluxDB Cloud notifies them when they hit cardinality limits.
- They notice that reads, and sometimes writes, are getting slower and slower on InfluxDB Cloud or InfluxDB OSS 2.0.
InfluxDB uses the measurements and tags that you write to the database to create indexes to speed up reads. However, when there are too many indexes created, both writes and reads can start to slow down. This requires extra planning when tagging data for certain workloads to avoid cardinality-related issues that could impact performance.
For those using InfluxDB’s time series merge tree (TSM) engine, the following templates can help identify and monitor data cardinality.