Here at InfluxData, we have a pretty great community. You all ask smart questions and contribute to the constant improvement of our platform. You’re the butter on our bread and the glaze on our donut. Okay, now I’m hungry.
We want to answer as many of your questions as possible, so we’re going to tackle some of the most common questions from our community.
This week’s question: When do I use Kapacitor instead of Continuous Queries?
If you’re asking this question, 1) you’re not alone and 2) you’re right to wonder.
Let’s start from the beginning.
From the docs, “Continuous Queries (CQ) are InfluxQL queries that run automatically and periodically on realtime data and store query results in a specified measurement.”
CQs were designed to aggregate the data you want to keep in a new measurement (referred to as downsampling). Your time series data comes in thousands or millions of points; you don’t want to store them all forever unless absolutely necessary because the disk requirements quickly get out of hand. CQs offer a way for you to keep the summaries of your data without keeping all of the individual points. With CQs, you can have the full resolution data expire with a retention policy (or you can drop it manually) and you keep only what you need.
Kapacitor is the “K” in our TICK stack, a powerful processing engine that can sometimes live behind a veil of secrecy, like that neighbor you had as a kid you were convinced would destroy you if you lost a ball in their yard. But Kapacitor isn’t scary. It’s glorious.
If you use Kapacitor, you probably know there are two types of tasks: streaming and batch. You can read this guide on when to use batch over streaming if you’re interested in that sort of thing.
Kapacitor can do so much, and that’s why we’re going to focus on one feature here: Kapacitor can query data from InfluxDB on a schedule, perform transformations (or any user-defined functions), and store the transformed data back in InfluxDB. It can process the data as a stream, not just a periodic query, and it can even alert based on conditions set by you. That sounds a lot like a Continuous Query. We could query our InfluxDB instance, run some aggregate functions, and store it in a different InfluxDB instance with an infinite retention policy.
Here’s where we start to wonder whether CQs or Kapacitor is the best tool for the job. Here are a few things that can help you decide.
- Are you running a significant number of CQs? Use Kapacitor (streaming task).
Significant is relative, obviously, but if your CQs are causing your InfluxDB instance to lock up, falling behind schedule, or degrading your dashboard query workloads, move the workload to Kapacitor to free up the database’s resources.
Sidenote: A streaming task will be more performant while a batch task will still query Influxdb and create extra load.
- Do you want to perform more complex data transformations? Use Kapacitor.
Kapacitor offers all of the InfluxQL functionality, as well as the ability to add functions written by the user (that’s you!) in a language of your choosing.
- Are you writing queries to downsample a limited amount of data? Use CQs.
CQs only become problematic for performance when you run a high number of them or do complex queries. If you know what you need and how to achieve it with InfluxQL, use CQs to get the job done. Users with the Enterprise version of Kapacitor can also enable CQs on a subset of their cluster so that nodes can focus on CQ(s) if you need it.
If you just realized you should be using Kapacitor, read this guide on how to use Kapacitor as a Continuous Query engine.
If you just realized you want to write CQs, read this introduction to writing Continuous Queries.
Do you have more questions about your use case? Ask us on our community site.
Suggest more blogs you’d like to see here or say hi on Twitter: