Query Distinct Tag Values in Under 30ms with the InfluxDB 3 Distinct Value Cache

Navigate to:

The Distinct Value Cache (DVC) available with InfluxDB 3 Core and InfluxDB 3 Enterprise lets you cache distinct values of specific columns and query those values in under 30ms.

The DVC is an in-memory cache that stores distinct values of one or more columns in a table. It is typically used to cache distinct tag values, but you can also cache distinct field values. When you create a DVC, you specify what columns’ distinct values to cache, the maximum number of distinct value combinations to cache, and how long to keep distinct values in the cache (TTL).

Why use a Distinct Value Cache?

The DVC provides a simple and performant way to query distinct column values in under 30ms. This type of query is commonly used to populate selectable options in web applications. For example, you can create a Grafana template variable that lets you select from a list of distinct tag values and modify a dashboard’s queries to display data specific to the selected values.

To return a list of distinct tag values without using the DVC, you would use a query similar to:

SELECT DISTINCT
  region
FROM
  system_monitor
WHERE
  time ">"= now() - INTERVAL '7 days'
  AND time ""= now()

This query requires time bounds to prevent the query engine from having to read all rows in the queried table. Without time bounds or when the queried time range is too large, the query could be very “heavy,” potentially requiring a resource-hungry full table scan. However, the downside of the time bounds is that the query won’t return distinct values outside of the queried time range.

To query distinct values from the DVC, the query would look similar to:

SELECT region FROM distinct_cache('system_monitor', 'region_cache')

This query doesn’t need to perform a full table scan to derive a list of distinct column values; it just queries the list of distinct values from the in-memory cache. Distinct value expiration is handled by the cache, ensuring the query returns all unexpired distinct values, regardless of time range. And it returns results in approximately 30 ms.

Set up a Distinct Value Cache

Each DVC is associated with a table, and a table can have multiple DVCs. You can add a DVC to an existing table, but for this example, we’ll create a new table to store the European Union (EU) wind sample dataset.

  1. Use the influxdb3 create table command to create a new wind_data table. Because we know the schema of the sample data, we can pre-create the table with the necessary tag and field columns:
influxdb3 create table \
  --token INFLUXDB_TOKEN \
  --database EXAMPLE_DB \
  --tags country,county,city \
  --fields wind_direction:int64,wind_speed:float64 \
  wind_data

Replace the following:

  • INFLUXDB_TOKEN: Your InfluxDB admin token.
  • EXAMPLE_DB: The name of the target database.

Note: You can also use InfluxDB 3 Explorer to create and manage DVCs.

  1. Use the influxdb3 create distinct_cache command to create a new DVC associated with the wind_data table. You can provide the following:
  • Table (--table): (Required) The name of the table to associate the DVC with.
  • Cache name: A unique name for the cache. If you don’t provide one, InfluxDB automatically generates a cache name for you.
  • Columns (--columns): Specify which columns to include in the cache. The cached columns will only include the distinct values from that column. Columns that benefit from a DVC are typically tags, but you can cache any string-typed column, including fields.

Tip: The DVC structures columns in the order you provide them when creating the cache. Column order determines how rows in the cache are sorted and can affect query performance. If column values are hierarchical—meaning one column is a subset of another column—list the columns highest in hierarchical order first.

In this example, we’ll create a DVC named wind_locations associated with the wind_data table. We’ll cache distinct values from the country, county, and city columns:

influxdb3 create distinct_cache \
  --token INFLUXDB_TOKEN \
  --database EXAMPLE_DB \
  --table wind_data \
  --columns country,county,city \
  wind_locations

Replace the following:

  • INFLUXDB_TOKEN: Your InfluxDB admin token.
  • EXAMPLE_DB: The name of the target database.
  1. Write data to the table associated with the cache. For this example, write the EU wind sample data (the link provides other write methods and commands).
influxdb3 write \
  --token INFLUXDB_TOKEN \
  --database EXAMPLE_DB \
  "$(curl --request GET https://docs.influxdata.com/downloads/eu-wind-data.lp)"

Replace the following:

  • INFLUXDB_TOKEN: Your InfluxDB token with write access to the target database.
  • EXAMPLE_DB: The name of the target database.

Optimize Cache Size

DVCs provide options to help optimize the size and memory footprint of the cache. Use the following options to ensure your cache doesn’t grow too large:

  • --max-cardinality: Specify the maximum number of unique column value combinations in your cache. For example, using the hierarchical schema of the EU wind sample data, each unique combination of country, county, and city counts against the cardinality limit. The default maximum cardinality is 100000 (one-hundred thousand).
  • --max-age: Specify the maximum age or time to live (TTL) for values in the cache. The age of each value is reset each time that a distinct value is written to InfluxDB. The default maximum age is 24 hours.

Query data in the Distinct Value Cache

Use the distinct_cache() function in the FROM clause of a SQL SELECT statement to query data from the DVC. distinct_cache() supports the following arguments:

  • table_name: (Required) the name of the table the DVC is associated with, formatted as a string literal
  • cache_name: the name of the DVC to query from, formatted as a string literal (only required if there is more than one DVC associated with the table)
distinct_cache(table_name, cache_name)

To query the DVC for the written sample data, execute the following query:

SELECT * FROM distinct_cache('wind_data', 'wind_locations')

This is just a normal SQL query, so you can include other SQL clauses to modify query results. For example, if you only want cities in Spain, you can use the following query:

SELECT
  city
FROM
  distinct_cache('wind_data', 'wind_locations')
WHERE
  country = 'Spain'

Note: InfluxQL does not support the distinct_cache() function. You can only query data in a DVC using SQL.

The DVC in practice

The target use case for the DVC is to help build performant user experiences that display distinct column values for users. For example, creating a UI that lists all the unique values of a tag. Using the DVC will help you to quickly query and return that list.

Grafana Dashboard Variables

One common use case that greatly benefits from using the DVC is creating query-based variables for a Grafana dashboard. Let’s say you want to create a dashboard for the EU wind sample data that lets users select what country, county, and cities to include in the visualizations.

Note: Grafana 12.2.0 fixed a bug that prevented DVC queries from successfully returning results. If using an earlier version of Grafana, cast each column in your SELECT clause to a STRING type–for example: SELECT country::STRING ...

Return a list of distinct countries

Create a country variable and use the following query to return the variable values:

SELECT
  country
FROM
  distinct_cache('wind_data', 'wind_locations')

Return a list of distinct counties in the selected countries

Create a county variable and use the following query to return counties from the selected countries:

SELECT
  county
FROM
  distinct_cache('wind_data', 'wind_locations')
WHERE
  country IN (${country:singlequote})

Return a list of cities in the selected countries and counties

Create a city variable and return the list of cities from the selected countries and counties:

SELECT
  city
FROM
  distinct_cache('wind_data', 'wind_locations')
WHERE
  country IN (${country:singlequote})
  AND county IN (${county:singlequote})

Use the variable selections in your Grafana dashboard queries

In each of your dashboard queries, use the variables to filter on country, county, and city. The example below uses multi-select variables that return a list of values. It uses Grafana’s variable interpolation to structure the list of selected variable values as a SQL array. The variables also include an All option, and the query changes the behavior of the WHERE conditions if All is selected.

SELECT
  time,
  city,
  wind_direction
FROM
  wind_data
WHERE
  time ">"= $__timeFrom
  AND time ">"= $__timeTo
  AND country IN (${country:singlequote})
  AND CASE
        WHEN 'All' IN (${county:singlequote}) THEN TRUE
        ELSE county IN (${county:singlequote})
      END
  AND CASE
        WHEN 'All' IN (${city:singlequote}) THEN TRUE
        ELSE city IN (${city:singlequote})
      END
ORDER BY
  time,
  city

To see an example Grafana dashboard that visualizes the EU wind sample data and uses the InfluxDB DVC to populate dashboard variables, download and install the EU Wind Sample Data Grafana Dashboard.

What to know about the Distinct Value Cache

The InfluxDB 3 Distinct Value Cache is an incredibly powerful tool, but there are important things to know when using it.

High-Cardinality Distinct Value Combinations

DVCs are stored in memory, and the larger the cache, the more memory it requires to maintain the cache. It’s critical to balance the size of your DVCs with the amount of memory it takes to store them. The higher the cardinality of your distinct value combinations, the larger your cache. “Cardinality” refers to the number of unique column combinations in your cached data. As a best practice, only cache distinct values from columns that are important to your query workload. Caching distinct tag or field values unnecessarily results in higher cardinality and memory usage without any benefit.

Distinct Value Caches are flushed when the server stops

Because the DVC is an in-memory cache, any time the server stops, the cache is flushed. After a server restart, InfluxDB 3 Enterprise queries previously written data and populates the cache. However, InfluxDB 3 Core does not and only writes new values to the DVC when you write data.

Share your feedback

The InfluxDB 3 Distinct Value Cache is a powerful device that lets you get the best performance on queries that need to return distinct column values. It’s another tool in your time series toolbelt that helps make sure your workload is as performant as possible.

Try the DVC and let us know what you think! Check out our Getting Started Guide for Core and Enterprise, and share your feedback with our development team on Slack in the #influxdb3_core channel, or on our Community Site.