TL;DR InfluxDB Tech Tips — Using Tasks and Checks for Monitoring with InfluxDB

Navigate to:

In this post, we learn how to use tasks in combination with checks for monitoring with InfluxDB.

Q: What is the monitoring workflow for InfluxDB? A: According to the documentation, the monitoring workflow involves the following steps:

  1. A check in InfluxDB queries data and assigns a status with a _level based on specific conditions.
  2. InfluxDB stores the output of a check in the statuses measurement in the _monitoring system bucket.
  3. Notification rules check data in the statuses measurement, and based on conditions set in the notification rule, send a message to a notification endpoint.
  4. InfluxDB stores notifications in the notifications measurement in the _monitoring system bucket.

Q: What resources are available to me for monitoring with Flux and InfluxDB? A: Here is a list of resources that might be useful to you if you’re looking to monitor your data with InfluxDB:

Q: What is the difference between a check and an alert? A: A check queries your data in InfluxDB and applies a status to it. An alert is the notification that gets sent to your notification endpoint, based on your notification rules.

Q: How do you create a check? A: You can create a check through the UI. Let’s create a check on the percentage of mem available for our system. Here are the steps for creating a check.

      1. Navigate to the Alerts tab in the UI and create a new check. In this example, we'll be creating a Threshold check. However, you also have the option to create a Deadman check as well.influxdb alerts checks 2. Name your check, "mem available percentage". Use the Data Explorer to select the fields you want to create a check on. Apply an aggregation function to the data. Here we're monitoring the max value of the percentage of mem available every 15s. influxdb create check
        Step 1 for creating a check. Defining a query and applying an aggregation to it.
        3. Configure your thresholds. Since we're monitoring the available percentage, I set my check status to "CRIT" when my system has less than 20% mem available. I set my check status to "WARN" when my system has between 20%-30% mem available (if you look closely, you'll see I defined the "WARN" status when the mem available is between 20.5% and 29.5% to allow you to easily visualize the thresholds). Although the configuration is not pictured here, I set my check status to "OK" when my system has more than 30% mem available. Finally, I can click the green check in the upper right corner to create my check.
        1. influxdb-create check mem available
          Step 2 for creating a check. Configuring the check. "CRIT" (red line), "WARN" (yellow line), and "OK" (green line) statuses configured for mem percent available.
    Q: How can I see the output of my check? Where is the _level column? A: Navigate to the default _monitoring bucket in your InfluxDB instance and filter for your check ID or check name to view the results. influxdb view check output
    Viewing the output of the check
    Q: A threshold check or deadman check isn't useful to me at the moment. I need to perform data transformation first in order to be able to take advantage of this. What do I do? A: You can create a task to transform your data first and write the output to a new bucket or measurement. Then you can create a check on your transformed data. For example, let's say you want to alert on power levels that exceed a certain value, but you only have current and voltage data. In order to create a check on power, you first need to run a task to calculate the power from the voltage and current. Create a task with these steps:
      1. Navigate to the task tab.influxdb task tab 2. Write your task. Include the task configuration options, the data source, and the destination. Make sure you have created a destination bucket prior to writing a task. The to() function doesn't generate new buckets. In this example, we're creating a task to calculate the power from the current and voltage. Now that we have created this task, we create a check to alert us when the power level is too high or too low.
        1. influxdb write task
          An example of a Task that calculates the power of a circuit from current and voltage data. This task is run on 5m intervals. It uses pivot() function to transform the data so we can perform math on the two fields in the same measurement. The map() function is responsible for executing the math.
    Note: You can also apply this type of math with the Telegraf Starlark Processor Plugin. Please look at this blog for the corresponding example. However, you should use Flux and tasks to perform complicated data transformations. Q: My data has levels that aren't "CRIT", "WARN", "INFO", or "OK". How can I create checks on data with different statuses? A: Run a task and label your data with your own custom statuses. This question highlights an important distinction between the check statuses and the status of your data.  Check statuses ("CRIT", "WARN", "INFO", or "OK") are used to create notification rules and alerts. After labeling your data with your own custom statuses, assign your custom status to a check status and create notification rules around it. Take advantage of notification messages to correlate your custom status with a check status. For example, imagine that you're monitoring a solar battery. You want to assign your data with the status of charging (CH) or discharging (DH). The data transformation in your task should include conditional logic like this:
    from(bucket: "solar")
    |> range(start: -task.every)
    |> filter(fn: ® => r["_measurement"] == "battery")
    |> filter(fn: ® => r["_field"] == "kWh")
    |> derivative(unit: 3s, nonNegative: false, columns: ["_value"], timeColumn: "_time")
    |> map(fn: ® => ({
    r with
    _battery_level:
    if r._value > 0.0 then "CH"
    else "DH"
    }))
    Now if you need to alert on when the battery is discharging, you can create a check that filters your data for when  |> filter(fn: ® => r["_level"] == "DH") and when the value is less than 0. Correlate the check status with your custom status by creating the following message: Check: ${ r._check_name } is:${string(v: r._battery_level)} The use of Flux String Interpolation here is redundant because our _battery_level is already a string. However, I included it to demonstrate how you can convert a field type to include the value in your message if you need. Important Note: If you use this workflow for custom statuses, please make sure not to label your custom status column with _level as that is reserved for check statuses. Q: I need to add more than 4 statuses for my check. What do I do? A: Unfortunately, you can't add more than 4 statuses in one check. However, you can always create a new check to include more statuses. For example, you can separate the checks with informative names that distinguish statuses based off of tags or threshold levels. Again, I encourage you to take advantage of notification messages to bring clarity to your alerts. Note: If this approach feels cumbersome and you're interested in a feature enhancement that allows you to add more than 4 statuses for one check, please comment on this issue #19208.