<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>InfluxData Blog - Reid Kaufmann</title>
    <description>Posts by Reid Kaufmann on the InfluxData Blog</description>
    <link>https://www.influxdata.com/blog/author/reid-kaufmann/</link>
    <language>en-us</language>
    <lastBuildDate>Tue, 20 Jan 2026 08:00:00 +0000</lastBuildDate>
    <pubDate>Tue, 20 Jan 2026 08:00:00 +0000</pubDate>
    <ttl>1800</ttl>
    <item>
      <title>A New Way to Debug Query Performance in Cloud Dedicated</title>
      <description>&lt;p&gt;I’d like to share a new &lt;code class="language-markup"&gt;influxctl&lt;/code&gt; ease-of-use feature in &lt;a href="https://www.influxdata.com/downloads/?utm_source=website&amp;amp;utm_medium=new_query_tuning_feature_influxdb&amp;amp;utm_content=blog"&gt;v2.12.0&lt;/a&gt; that makes it easier to optimize important queries or debug slow ones. &lt;code class="language-markup"&gt;influxctl&lt;/code&gt; has had the capability to send queries and display the results in JSON or tabular formats for some time. (Note: this CLI utility is specific to Cloud Dedicated and Clustered, as are many of the specifics in this post.) In Clustered, you can monitor querier pods’ logs, and in both Dedicated and Clustered, &lt;a href="https://docs.influxdata.com/influxdb3/cloud-dedicated/admin/query-system-data/#query-logs"&gt;metrics on individual queries’ performance can be found in the system tables&lt;/a&gt;. Both of those options offer a lot of data—enough that it can be hard to digest quickly. Additionally, associating a single execution of a query to its log entry is tedious. A new feature, the &lt;code class="language-markup"&gt;--perf-debug&lt;/code&gt; flag for the &lt;code class="language-markup"&gt;influxctl&lt;/code&gt; query command (&lt;a href="https://docs.influxdata.com/influxdb3/cloud-dedicated/reference/release-notes/influxctl/#2120"&gt;release notes&lt;/a&gt;), accelerates the experimentation cycle by providing real-time feedback, allowing you to stay in the context of your shell as you tweak your query.&lt;/p&gt;

&lt;h2 id="sample-output"&gt;Sample output&lt;/h2&gt;

&lt;p&gt;The new flag, &lt;code class="language-markup"&gt;--perf-debug&lt;/code&gt;, will execute a query, collect and discard the results, and emit execution metrics instead. When &lt;code class="language-markup"&gt;--format&lt;/code&gt; is omitted, output defaults to a tabular format with units dynamically chosen for human readability. In the second execution below, &lt;code class="language-markup"&gt;--format json&lt;/code&gt; is specified to emit a data format appropriate for programmatic consumption: in a nod to the querier log, it uses keys with shorter variable names, delimits words with underscores, and sports consistent units (bytes, seconds as a float).&lt;/p&gt;

&lt;p&gt;In the tabular format, you can also see a demarcation between client and server metrics.&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-bash"&gt;$ influxctl query --perf-debug --token REDACTED --database reidtest3 --language influxql "SELECT SUM(i), non_negative_difference(SUM(i)) as diff_i FROM data WHERE time &amp;gt; '2025-11-07T01:20:00Z' AND time " '2025-11-07T03:00:00Z' AND runid = '540cd752bb6411f0a23e30894adea878' GROUP BY time(5m)"
+--------------------------+----------+
| Metric                   | Value    |
+--------------------------+----------+
| Client Duration          | 1.222 s  |
| Output Rows              | 20       |
| Output Size              | 647 B    |
+--------------------------+----------+
| Compute Duration         | 37.2 ms  |
| Execution Duration       | 243.8 ms |
| Ingester Latency Data    | 0        |
| Ingester Latency Plan    | 0        |
| Ingester Partition Count | 0        |
| Ingester Response        | 0 B      |
| Ingester Response Rows   | 0        |
| Max Memory               | 70 KiB   |
| Parquet Files            | 1        |
| Partitions               | 1        |
| Planning Duration        | 9.6 ms   |
| Queue Duration           | 286.6 µs |
+--------------------------+----------+

$ influxctl query --perf-debug --format json --token REDACTED --database reidtest3 --language influxql "SELECT SUM(i), non_negative_difference(SUM(i)) as diff_i FROM data WHERE time &amp;gt; '2025-11-07T01:20:00Z' AND time " '2025-11-07T03:00:00Z' AND runid = '540cd752bb6411f0a23e30894adea878' GROUP BY time(5m)"
{
  "client_duration_secs": 1.101,
  "compute_duration_secs": 0.037,
  "execution_duration_secs": 0.247,
  "ingester_latency_data": 0,
  "ingester_latency_plan": 0,
  "ingester_partition_count": 0,
  "ingester_response_bytes": 0,
  "ingester_response_rows": 0,
  "max_memory_bytes": 71744,
  "output_bytes": 647,
  "output_rows": 20,
  "parquet_files": 1,
  "partitions": 1,
  "planning_duration_secs": 0.009,
  "queue_duration_secs": 0
  }&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="notes"&gt;Notes&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Client duration&lt;/strong&gt; includes the time to open the connection to the server. In the example, you can see a big delta between that and the server’s total duration. When I ran this command, my client and database server were not colocated. Additionally, &lt;code class="language-markup"&gt;influxctl&lt;/code&gt; may not be tuned for optimal connection latency. Your native client probably caches connections and might not suffer this latency. When tuning your query, it’s more important to look at the durations recorded by the server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output size&lt;/strong&gt; is the size in Arrow format in memory, after gzip inflation (if client and server agree on compression), so this metric does not report the bytes transferred. The network bytes transferred might be more useful, so that’s a potential future enhancement. However, the current metric can still provide a relative metric to compare between different queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ingester metrics&lt;/strong&gt; are zeroed out if the ingester has no partitions with unpersisted data matching the query. In Serverless, Dedicated, and Clustered, queries always consult ingesters, so the 0 in ingester latency can be misleading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parquet files&lt;/strong&gt; indicate how many files were traversed for the query. However, if the query was optimized by a ProgressiveEvalExec plan (simple sorted LIMIT queries without aggregations, typically; verify with &lt;a href="https://docs.influxdata.com/influxdb3/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/"&gt;EXPLAIN ANALYZE&lt;/a&gt;), this value may not be useful because it is calculated in planning and then reflects the potential number of files to be accessed determined by the time range, as opposed to the actual number accessed before reaching the LIMIT. For most queries, this metric is a handy indicator, but it’s worth noting that the query log also contains a related metric, deduplicated_parquet_files, which tells us how many of the files had overlapping time ranges, requiring the querier to merge/sort/deduplicate data. It’s normal to have a few files at the leading edge, but this operation becomes a serious bottleneck if too much data needs to be deduplicated (the main responsibility of the compactor is to manage this problem).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query durations vary&lt;/strong&gt;, and a query can be executed several times (and at different times of day) to get a sense of the variation.&lt;/p&gt;

&lt;h2 id="potential-sources-of-latency-or-variability"&gt;Potential sources of latency or variability&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cache warmup&lt;/strong&gt; (shows up in &lt;strong&gt;Execution Duration&lt;/strong&gt;): The first few times a query in a particular time frame for a table is executed, the duration may be significantly higher due to parquet cache misses. Queries fan out to multiple queriers in a round robin fashion, and each querier has an independent parquet cache, so expect a few cache misses as each querier may have to incur the delay of retrieving parquet files from the object store. Due to multiple load balancer pods and other clients executing queries, how many query executions it will take to warm up all queries is indeterminate. If the queries are on the “leading” edge of the data, be aware that persistence of new data OR compaction may also periodically cause cache misses. Large L2 file compactions lead to a greater disruption, while latency from typical small incremental persists may be imperceptible.&lt;/p&gt;

&lt;p&gt;Corollary: &lt;strong&gt;cache eviction&lt;/strong&gt;. Other queries executing may cause cache eviction to make room for their data. Given a high rate of queries covering a lot of data (many series and/or a wide time frame), it’s possible to thrash the cache. In this case, &lt;code class="language-markup"&gt;influxctl&lt;/code&gt; can’t provide much context about other queries running at the same time (exception: a non-zero &lt;strong&gt;Queue Duration&lt;/strong&gt; does indicate maximum execution concurrency was reached). You may still need to review the query log or observability dashboards. Some query loads are cyclical, and so is the work of the compactor, depending on ingest and partitioning rates; therefore, you may get better performance in the afternoon than in the morning. When the CPU is maxed out, it tends to increase all recorded server latencies.&lt;/p&gt;

&lt;p&gt;Variation in &lt;strong&gt;data density&lt;/strong&gt; or &lt;strong&gt;volume&lt;/strong&gt; will affect all queries to some degree, but will impact computationally intensive queries the most. This shows up in &lt;strong&gt;Execution Duration&lt;/strong&gt;. Monitor the &lt;strong&gt;Parquet Files&lt;/strong&gt; or the &lt;strong&gt;Output Rows&lt;/strong&gt; metrics as possible proxies for this. Be aware that changing a tag value in the &lt;code class="language-markup"&gt;WHERE&lt;/code&gt; clause or the time constraints may affect latency, depending on the underlying data. Not all writers may employ the same frequency. When tuning aggregate queries, you may occasionally want to add a &lt;code class="language-markup"&gt;COUNT&lt;/code&gt;() field and drop the &lt;code class="language-markup"&gt;--perf-debug&lt;/code&gt; flag to see how many records are contributing. For some queries (&lt;code class="language-markup"&gt;SELECT DISTINCT&lt;/code&gt;, for example), tag cardinality and time range can greatly impact performance.&lt;/p&gt;

&lt;p&gt;See &lt;a href="https://docs.influxdata.com/influxdb3/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries/"&gt;documentation&lt;/a&gt; for more general information on query optimization.&lt;/p&gt;

&lt;h2 id="other-things-to-try"&gt;Other things to try&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Check if &lt;strong&gt;Planning Duration&lt;/strong&gt; is substantially higher than &lt;strong&gt;Execution Duration&lt;/strong&gt;. This can be caused by high numbers of tables or partitions, which may be excessive or intended. Custom partitioning can help reduce execution latency, but can increase planning latency—find the right balance for your workload.&lt;/li&gt;
  &lt;li&gt;Check if &lt;strong&gt;Ingester Latency&lt;/strong&gt; or &lt;strong&gt;Response&lt;/strong&gt; is abnormally high/large. It may indicate a need for, or a problem &lt;em&gt;with&lt;/em&gt;, custom partitioning, resulting in excessive delay in persisting partitions.&lt;/li&gt;
  &lt;li&gt;If &lt;strong&gt;Parquet Files&lt;/strong&gt; is abnormally large, check that the query has a time constraint and that it’s reasonable. Also, check observability dashboards to see if the compactor is not keeping up or look for &lt;a href="https://docs.influxdata.com/influxdb3/clustered/admin/query-system-data/#view-systemcompactor-schema"&gt;skipped partitions&lt;/a&gt;. If customer partitioning on a tag is in use, make sure the query is specifying a value for that tag in the &lt;code class="language-markup"&gt;WHERE&lt;/code&gt; clause (also note that regexes that don’t equate to simple equality checks on said field will also prevent partition pruning).&lt;/li&gt;
  &lt;li&gt;How much does increasing or decreasing the time range of the query change the execution metrics?&lt;/li&gt;
  &lt;li&gt;Compare similar queries against different tables, schemas, or partitioning schemes.&lt;/li&gt;
  &lt;li&gt;Compare different means of achieving the same result (&lt;code class="language-markup"&gt;SQL ORDER BY time DESC LIMIT 1&lt;/code&gt; vs INFLUXQL &lt;code class="language-markup"&gt;LAST&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can learn a lot through experimentation and finding correlations beyond those suggested here. We hope this minor feature makes it a little easier!&lt;/p&gt;
</description>
      <pubDate>Tue, 20 Jan 2026 08:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/new-query-tuning-feature-influxdb/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/new-query-tuning-feature-influxdb/</guid>
      <category>Developer</category>
      <author>Reid Kaufmann (InfluxData)</author>
    </item>
    <item>
      <title>Optimizing Queries in InfluxDB 3 Using Progressive Evaluation</title>
      <description>&lt;p&gt;In a &lt;a href="https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/?utm_source=website&amp;amp;utm_medium=direct&amp;amp;utm_campaign=query_optimization_progressive_evaluation_influxdb&amp;amp;utm_content=blog"&gt;previous post&lt;/a&gt;, we described the technique that makes the “most recent values” queries hundreds of times faster and has benefited many of our customers. The idea behind this technique is to progressively evaluate time-organized files until we reach the most recent values. Since then, we have received questions like “What queries support progressive evaluation?” “How do we verify that a query is progressively evaluated?” “Are there certain file organizations that progressive evaluation won’t help?” This blog post answers those questions.&lt;/p&gt;

&lt;h2 id="queries-that-support-progressive-evaluation"&gt;Queries that support progressive evaluation&lt;/h2&gt;

&lt;p&gt;Currently, this technique is only available for SQL queries; it is not yet applicable to InfluxQL queries. Your SQL query must have the clause &lt;code class="language-markup"&gt;ORDER BY time DESC&lt;/code&gt; (or &lt;code class="language-markup"&gt;ASC&lt;/code&gt;). In addition, some other limitations include &lt;strong&gt;&lt;em&gt;expressions&lt;/em&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;em&gt;aliases&lt;/em&gt;&lt;/strong&gt; (including &lt;code class="language-markup"&gt;AT TIME ZONE&lt;/code&gt;), and &lt;strong&gt;&lt;em&gt;aggregations&lt;/em&gt;&lt;/strong&gt; on the &lt;code class="language-markup"&gt;SELECT&lt;/code&gt; clause. In other words, everything in the &lt;code class="language-markup"&gt;SELECT&lt;/code&gt; clause must be simple table columns.&lt;/p&gt;

&lt;h3 id="examples-of-supported-queries"&gt;Examples of supported queries&lt;/h3&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT host, temperature 
FROM   machine 
WHERE  time &amp;gt; now - interval ‘1 day’ and region = ‘US’
ORDER BY time ASC;

SELECT host, temperature 
FROM   machine 
WHERE  time &amp;gt; now - interval ‘1 day’ and region = ‘US’
ORDER BY time DESC
LIMIT  10;&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id="examples-of-unsupported-queries"&gt;Examples of unsupported queries&lt;/h3&gt;

&lt;p&gt;These queries are not optimized using progressive evaluation yet. We hope to lift the restrictions in a future release.&lt;/p&gt;

&lt;p&gt;Query with an expression (&lt;code class="language-markup"&gt;temperature + 2&lt;/code&gt;)&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT host, temperature + 2
FROM   machine 
WHERE  time &amp;gt; now - interval ‘1 day’ and region = ‘US’
ORDER BY time ASC;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Query with an alias (&lt;code class="language-markup"&gt;as host_name&lt;/code&gt;)&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT host as host_name, time 
FROM   machine 
WHERE  time &amp;gt; now - interval ‘1 day’ and region = ‘US’
ORDER BY time ASC;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Query that specifies time zone and then alias (&lt;code class="language-markup"&gt;AT TIME ZONE ‘Europe/Oslo’ as time)&lt;/code&gt;&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT host, time AT TIME ZONE ‘Europe/Oslo’ as time 
FROM   machine 
WHERE  time &amp;gt; now - interval ‘1 day’ and region = ‘US’
ORDER BY time DESC
LIMIT  10;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Query with an aggregate &lt;code class="language-markup"&gt;(min(temperature))&lt;/code&gt;&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT min(temperature)
FROM   machine 
WHERE  time &amp;gt; now - interval ‘1 day’ and region = ‘US’
ORDER BY time DESC
LIMIT  10;&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id="signal-that-a-query-is-evaluated-progressively"&gt;Signal that a query is evaluated progressively&lt;/h2&gt;

&lt;p&gt;If &lt;code class="language-markup"&gt;ProgressiveEvalExec&lt;/code&gt; is in your query plan, it is optimized using progressive evaluation (see &lt;a href="https://www.influxdata.com/blog/how-read-influxdb-3-query-plans/"&gt;this post&lt;/a&gt; for how to get and read the query plan). However, the absence of progressive evaluation does not mean your query will run slowly. We only apply it when it actually benefits your query.&lt;/p&gt;

&lt;h2 id="file-organizations-that-benefit-from-progressive-evaluation"&gt;File organizations that benefit from progressive evaluation&lt;/h2&gt;

&lt;p&gt;To understand when progressive evaluation benefits your query, we first need to understand data organization, which is one of the most important factors affecting query performance.&lt;/p&gt;

&lt;h3 id="data-organization-in-influxdb-30"&gt;Data organization in InfluxDB 3.0&lt;/h3&gt;

&lt;p&gt;Below is a brief description of how data is organized in InfluxDB 3.0 (see our &lt;a href="https://www.influxdata.com/blog/influxdb-3-0-system-architecture/?utm_source=website&amp;amp;utm_medium=direct&amp;amp;utm_campaign=query_optimization_progressive_evaluation_influxdb&amp;amp;utm_content=blog"&gt;system architecture&lt;/a&gt; and &lt;a href="https://www.influxdata.com/blog/compactor-hidden-engine-database-performance/?utm_source=website&amp;amp;utm_medium=direct&amp;amp;utm_campaign=query_optimization_progressive_evaluation_influxdb&amp;amp;utm_content=blog"&gt;data compaction&lt;/a&gt; for the complete data cycle, how data is compacted, and how it benefits query performance).&lt;/p&gt;

&lt;p&gt;As a time series database, data from the table &lt;code class="language-markup"&gt;machine&lt;/code&gt; always includes a &lt;code class="language-markup"&gt;time&lt;/code&gt; column representing the time of an event, such as temperature at 9:30 am UTC. Figure 1 shows three different stages of data organization. Each rectangle in the figure illustrates a chunk of data. &lt;strong&gt;C&lt;/strong&gt; represents data that is not yet persisted and usually includes the most recent values. &lt;strong&gt;L&lt;/strong&gt; represents the level of different persisted files. &lt;strong&gt;L0&lt;/strong&gt; is used for files of newly ingested and persisted data. They are usually small and contain recent values. However, L0 files of backfilled data can be as old as desired. &lt;strong&gt;L1&lt;/strong&gt; files store the results of compacting many small L0 files. We also have &lt;strong&gt;L2&lt;/strong&gt; files but they are beyond the scope of this topic and do not change how progressive evaluation works.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/6qkOCQrnjhOT8zlx7yZj4w/81b28dfcb4df0a72ef72d1ea830b2ccb/Data_organization_in_InfluxDB_3-0.png" alt="Data organization in InfluxDB 3-0" /&gt;&lt;/p&gt;

&lt;p class="has-text-centered"&gt;Figure 1 (borrowed from the &lt;a href="/blog/compactor-hidden-engine-database-performance/"&gt;compaction blog post&lt;/a&gt;): Four stages of data organization after two rounds of compaction.&lt;/p&gt;

&lt;p&gt;In stage 1, all data are in small L0 files. In stage 2, data in stage 1 has been compacted to larger L1 files, while some new data are persisted in a few small L0 files and some are in a not-yet-persisted chunk (C). If you ingest new data most of the time, your data organization mostly looks like stage 2 or stage 3. However, if you backfill data, your data organization can combine stages 1 and stage 2 or stage 3. Thus, depending on how you ingest data and how fast the compactor keeps up with your ingest workload, there may be few or many overlapped and small files. Stage 3 is what we call “well-compacted data” and is usually best for query performance. The goal of the compactor is to have most of your data in stage 3. Avoiding frequent backfilling of data also helps keep your data well-compacted.&lt;/p&gt;

&lt;h3 id="application-of-progressive-evaluation-in-various-overlap-scenarios"&gt;Application of progressive evaluation in various overlap scenarios&lt;/h3&gt;

&lt;p&gt;Let’s go over examples of querying different data sets. Figure 2 shows the data organization of the table &lt;code class="language-markup"&gt;machine&lt;/code&gt;. We use F for the prefix name of the files, which can be either L0 or L1. Files F1, F2, F6, and F7 do not time-overlap with any files. Files F3, F4, and F5 overlap with each other, and file F8 overlaps with F9, which overlaps with chunk C.&lt;/p&gt;

&lt;p&gt;&lt;img src="//images.ctfassets.net/o7xu9whrs0u9/1jwYhroMEa2u7bjbHyPSgi/4924600d5db1a009e0aa4723f0fa7d63/Data_organization_of_table_machine.png" alt="Data organization of table machine" /&gt;&lt;/p&gt;

&lt;p class="has-text-centered"&gt;Figure 2: Data organization of table &lt;code class="language-markup"&gt;machine&lt;/code&gt;&lt;/p&gt;

&lt;h4 id="reading-non-overlapped-files-only"&gt;Reading Non-Overlapped Files Only&lt;/h4&gt;

&lt;p&gt;If your query asks for latest data before t1:&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT temperature 
FROM   machine 
WHERE  time &amp;lt; t1 and region = ‘US’
ORDER BY time DESC LIMIT  1;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The query will be optimized with progressive evaluation because the files needed, F1 and F2, do not overlap. The simplified query plan is as follows:&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;ProgressiveEvalExec: fetch=1
    SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[true]   
         ParquetExec: file_groups={2 groups: [F2], [F1]}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A few essential properties in the query plan are needed for &lt;code class="language-markup"&gt;ProgressiveEvalExec&lt;/code&gt; to work correctly:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Files in &lt;code class="language-markup"&gt;ParquetExec&lt;/code&gt; are sorted on time descending F2, F1.&lt;/li&gt;
  &lt;li&gt;&lt;code class="language-markup"&gt;preserve_partitioning=[true]&lt;/code&gt; means data of 2 file groups, [F2] and [F1], are sorted in their own group and won’t be merged. This is important for us to be able to fetch data from F2 before fetching F1.&lt;/li&gt;
  &lt;li&gt;&lt;code class="language-markup"&gt;fetch=1&lt;/code&gt; means the query will stop running as soon as it gets a row that meets the query filters. In other words, if there is at least one row in file F2 with &lt;code class="language-markup"&gt;US&lt;/code&gt; as a region, F1 will never be read.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The same query sorted on time &lt;strong&gt;ascending&lt;/strong&gt; will look like this:&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;ProgressiveEvalExec: fetch=1
    SortExec: TopK(fetch=1), expr=[time ASC], preserve_partitioning=[true]   
         ParquetExec: file_groups={2 groups: [F1], [F2]}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You will get a similar query plan if your query reads data between t2 and t3 that include only non-overlapped files.&lt;/p&gt;

&lt;h4 id="reading-overlapped-files-only"&gt;Reading Overlapped Files Only&lt;/h4&gt;

&lt;p&gt;Now let’s look at the query reading data between t1 and t2.&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT temperature 
FROM   machine 
WHERE  time &amp;lt; t2 and time &amp;gt; t1 and region = ‘US’ 
ORDER BY time DESC LIMIT  1;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Because all three needed files, F3, F4, and F5, overlap, we need to merge data for &lt;a href="https://www.influxdata.com/blog/using-deduplication-eventually-consistent-transactions/"&gt;deduplication&lt;/a&gt; and they cannot be evaluated one by one progressively. The simplified query plan will look like this:&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[false]
  DeduplicateExec:
     SortPreservingMergeExec:
         ParquetExec: file_groups={3 groups: [F4], [F3], [F5]}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Without &lt;code class="language-markup"&gt;ProgressiveEvalExec&lt;/code&gt;, the files in &lt;code class="language-markup"&gt;ParquetExec&lt;/code&gt; can be in any order and group because the groups will be read in parallel and merged into one stream for deduplication.&lt;/p&gt;

&lt;p&gt;Similarly, progressive evaluation won’t be applied if your query reads overlapped data after t3.&lt;/p&gt;

&lt;h4 id="reading-a-mixture-of-non-overlapped-and-overlapped-files"&gt;Reading a Mixture of Non-Overlapped and Overlapped Files&lt;/h4&gt;

&lt;p&gt;When your query reads a mixture of non-overlapped and overlapped data, progressive evaluation is applied, and the data is split and grouped accordingly. Let’s look at a query that reads data before t2.&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT temperature 
FROM   machine 
WHERE  time &amp;lt; t2 and region = ‘US’ 
ORDER BY time DESC LIMIT  1;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The simplified query plan will look like this:&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;ProgressiveEvalExec: fetch=1
    SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[false]
       DeduplicateExec:
          SortPreservingMergeExec:
              ParquetExec: file_groups={3 groups: [F4], [F3], [F5]}
    SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[true]   
         ParquetExec: file_groups={2 groups: [F2], [F1]}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Even though files F3, F4, and F5 overlap, they do not overlap with F1 and F2 and contain more recent data. Therefore, the &lt;strong&gt;subplan&lt;/strong&gt; of F3, F4, and F5 is progressively evaluated with F2 and F1. Note that the number of input streams into &lt;code class="language-markup"&gt;ProgressiveEvalExec&lt;/code&gt; is three: one for the merge of F3, F4, and F5, one for F2, and one for F1. These three streams are evaluated progressively in that order.&lt;/p&gt;

&lt;p&gt;If the query is sorted &lt;strong&gt;ascending&lt;/strong&gt;, the progressive order will be opposite:&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;ProgressiveEvalExec: fetch=1
    SortExec: TopK(fetch=1), expr=[time ASC], preserve_partitioning=[true]   
         ParquetExec: file_groups={2 groups: [F1], [F2]}
    SortExec: TopK(fetch=1), expr=[time ASC], preserve_partitioning=[false]
       DeduplicateExec:
          SortPreservingMergeExec:
              ParquetExec: file_groups={3 groups: [F4], [F3], [F5]}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Three streams still go into &lt;code class="language-markup"&gt;ProgressiveEvalExec&lt;/code&gt; but in the opposite order: F1, F2, and the F3, F4, and F5 merge.&lt;/p&gt;

&lt;p&gt;Similarly, let’s read all data:&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;SELECT temperature 
FROM   machine 
WHERE  time &amp;lt; now and region = ‘US’ 
ORDER BY time DESC LIMIT  1;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The query plan is getting more complicated but follows the same rules: subplans of overlapped data will be put in the right order and progressively evaluated with non-overlapped files.&lt;/p&gt;

&lt;pre class=""&gt;&lt;code class="language-sql"&gt;ProgressiveEvalExec: fetch=1
    SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[false]
       DeduplicateExec:
          SortPreservingMergeExec:
              SortExec:
                 RecordBatchExec: {C}
              ParquetExec: file_groups={2 groups: [F8], [F9]}
    SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[true]   
         ParquetExec: file_groups={2 groups: [F7], [F6]}
    SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[false]
       DeduplicateExec:
          SortPreservingMergeExec:
              ParquetExec: file_groups={3 groups: [F4], [F3], [F5]}
    SortExec: TopK(fetch=1), expr=[time DESC], preserve_partitioning=[true]   
         ParquetExec: file_groups={2 groups: [F2], [F1]}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are six non-overlapped streams of data progressively evaluated by &lt;code class="language-markup"&gt;ProgressiveEvalExec&lt;/code&gt; in this order:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The merge of C, F8 and F9&lt;/li&gt;
  &lt;li&gt;F7&lt;/li&gt;
  &lt;li&gt;F6&lt;/li&gt;
  &lt;li&gt;The merge of F3, F4 and F5&lt;/li&gt;
  &lt;li&gt;F2&lt;/li&gt;
  &lt;li&gt;F1&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your query orders data on time ascending, you will have a similar query plan but in the opposite order.&lt;/p&gt;

&lt;h2 id="cache-implications-progressive-evaluation-may-help-other-queries-latency"&gt;Cache implications: progressive evaluation may help other queries’ latency&lt;/h2&gt;

&lt;p&gt;For workloads highly dependent on cached files to get the lowest latency possible but running near the cache limit, progressive evaluation can make a significant performance difference. Obviously, not traversing extra parquet files reduces CPU time for the optimized query. Depending on how excessive the time bound is compared to the bounds of the LIMIT, the fact that the database may bring far fewer files into cache means other queries’ files need not be evicted, potentially reducing the latency of other queries on the system.&lt;/p&gt;

&lt;h2 id="summing-up"&gt;Summing up&lt;/h2&gt;

&lt;p&gt;If your query selects pure table columns and orders data on time, InfluxDB 3.0 will automatically use progressive evaluation to improve your query performance, even if a subset of the query data contains non-overlapped files. Progressive evaluation cannot be used when all your data overlaps.&lt;/p&gt;
</description>
      <pubDate>Thu, 14 Nov 2024 07:00:00 +0000</pubDate>
      <link>https://www.influxdata.com/blog/query-optimization-progressive-evaluation-influxdb/</link>
      <guid isPermaLink="true">https://www.influxdata.com/blog/query-optimization-progressive-evaluation-influxdb/</guid>
      <category>Developer</category>
      <author>Nga Tran, Reid Kaufmann (InfluxData)</author>
    </item>
  </channel>
</rss>
