InfluxData Blog - Cole Bowden

Generate Synthetic Time Series Data in InfluxDB 3

Cole Bowden (InfluxData) — Fri, 12 Jun 2026 08:00:00 +0000

Getting InfluxDB 3 up and running is a pretty lightweight process with the installation script. Getting time series data into it is the next step, and for exploration, basic testing, or scenarios where you don’t have a stream of time series data ready to write, that can be a point of friction.

That hurdle is particularly high when you want to test the rest of the system around the data you’d be writing: dashboards, alerts, replication, network connectivity, edge devices, server sizing, or Processing Engine workflows—you don’t always have the ability to start writing production data into a freshly-installed database, or you may not have that data yet.

Two new InfluxDB 3 plugins help with exactly that: the Bird Tracking Simulator and the Signal Generator. Both are scheduled plugins that generate data directly to InfluxDB 3, making it easy to start writing realistic sample data with a single trigger. The Bird Tracking Simulator creates synthetic bird telemetry, while the Signal Generator creates configurable waveform data for sensor-like or metric-like use cases.

Why generate sample data this way?

A lot of InfluxDB workflows are easier to understand once data is actively moving through the system:a dashboard is easier to build when the line and the most recent datapoint keep changing, an alert is easier to validate when values cross a threshold, edge replication is easier to test when writes are arriving continuously, and a small server or single-board computer is easier to evaluate when you can watch how it behaves under a steady stream of points.

These plugins are meant to make that first step simple. Create a database, create a trigger, and InfluxDB 3 starts generating data on a schedule. From there, you can query it, visualize it, replicate it, downsample it, or use it as input for other Processing Engine plugins.

Bird Tracking Simulator

The Bird Tracking Simulator generates a stream of synthetic bird telemetry. On its first run, it creates a persistent flock of named birds, assigns each bird a variety of tags, such as species, name, and range, and stores the flock in the Processing Engine cache. Each scheduled execution of the plugin advances the flock by updating a number of measurements, including speed, heading, latitude, and longitude, with the birds going on random walks within a predefined range for each species.

The shape of the data is useful for a few reasons. It has multiple entities, tags, and geospatial fields. It changes over time in a way that is easy to inspect visually. That makes it a good fit for testing dashboards, map panels, edge replication, and basic query patterns that group or filter by tags.

The plugin writes to the bird_tracking measurement. Its configuration is also intentionally small, specified with simple trigger arguments: bird_count controls how many persistent birds are tracked, and points_per_bird controls how many movement points each bird emits per scheduled run. The defaults are 25 birds and 1 point per bird. The number of data points the plugin generates is a simple product of these two options and the trigger specification for how often the plugin runs.

The plugin requires Faker, so install that first:

influxdb3 install package Faker

Then create a database and a scheduled trigger:

influxdb3 create database sample_data

influxdb3 create trigger \
  --database sample_data \
  --path "gh:influxdata/bird_data_simulator/bird_data_simulator.py" \
  --trigger-spec "every:10s" \
  --trigger-arguments bird_count=10,points_per_bird=10 \
  bird_tracking_demo

After the trigger has run a few times, query the generated data:

influxdb3 query \
  --database sample_data \
  "SELECT * FROM bird_tracking ORDER BY time DESC LIMIT 5"

For a denser stream, increase the flock size, increase the points per bird, or adjust the trigger interval. That gives you a simple way to create a steady stream of entity-oriented time series data. You can use it to populate dashboards, test writes across a network, or quickly confirm that a new InfluxDB 3 setup is receiving, storing, and querying data as expected.

Signal Generator

The Signal Generator achieves many of the same things, but by generating numeric signals rather than named entities. The default preset produces a signal centered around 30, with a slow sine trend, Gaussian noise, and occasional spikes. It uses only the Python standard library, supports configurable measurement names, field names, tags, and point resolution, and can compose multiple waveform types together. Supported waveform types include sine, square, triangle, sawtooth, noise, and spike.

That makes it useful for testing dashboards, threshold checks, alerting behavior, anomaly detection, and any workflow that requires a predictable yet non-static stream of numeric values. A line with trend, noise, and the occasional spike gives you something closer to the patterns you usually care about when working with time series data.

The simplest version uses the default preset:

influxdb3 create database signals

influxdb3 create trigger \
  --database signals \
  --path "gh:influxdata/signal_generator/signal_generator.py" \
  --trigger-spec "every:10s" \
  signal_basic

Once the trigger has run, query the latest generated values:

influxdb3 query \
  --database signals \
  "SELECT time, value FROM signal ORDER BY time DESC LIMIT 10"

You can also aggregate the generated signal over time:

influxdb3 query \
  --database signals \
  "SELECT
     time_bucket(time, INTERVAL '1 minute') AS minute,
     AVG(value) AS avg_value,
     MIN(value) AS min_value,
     MAX(value) AS max_value
   FROM signal
   WHERE time > now() - INTERVAL '1 hour'
   GROUP BY minute
   ORDER BY minute DESC"

For custom waveforms, the plugin can be configured with JSON arguments through InfluxDB 3 Explorer or the Processing Engine API. That lets you define signals for different measurements, fields, and tags, or stack waveforms together to create the shape you want.

For example, you might create one signal that looks like a temperature sensor, another that behaves like CPU utilization, and another that emits occasional spikes to test an alerting path. Because each trigger can have its own configuration, you can build out a small set of synthetic streams that exercise different parts of your system.

Lightweight data generation for InfluxDB 3

The Bird Tracking Simulator and Signal Generator are small plugins, but they solve a useful problem: they make it easy to get fresh time series data flowing through InfluxDB 3 with very little setup, allowing you to test your deployment and ensure data is flowing to and from every system downstream of your InfluxDB instance.

Use the Bird Tracking Simulator when you want moving, entity-oriented telemetry with tags and location fields. Use the Signal Generator when you want numeric signal data for dashboards, alerts, thresholds, and processing workflows.

Check out the plugins in the InfluxDB 3 plugin repository, try them on the hardware you already have, and use them as a quick way to exercise InfluxDB 3, the Processing Engine, and the systems connected to them.

Anomaly Detection and Forecasting That Learns From Every Write in InfluxDB

Cole Bowden, Ryan Nelson (InfluxData) — Thu, 04 Jun 2026 08:00:00 +0000

For many operational time series workloads, machine learning can’t operate in the historical way, where data is compiled once and models are trained offline. Sensor readings, infrastructure metrics, application telemetry, energy data, industrial measurements, and financial ticks all share a basic property: the next datapoint is more useful when the system can respond to it immediately (or at least close to immediately). When a model learns in the same flow that ingests data and reacts to incoming data as it’s written, things like anomaly detection, short-horizon forecasting, and adaptive thresholding all become a lot more useful.

Enter three new River-based plugins for InfluxDB 3:

These plugins are built for the InfluxDB 3 Processing Engine and leverage River, a Python library for online machine learning. If you’re unfamiliar with it, the Processing Engine is an embedded Python VM that runs inside InfluxDB 3 and can execute plugin code on writes, schedules, or HTTP requests; it also provides an in-memory cache for stateful applications. For write-triggered plugins, InfluxDB 3 can send batches of data to a plugin as data is flushed through the write-ahead log. River, meanwhile, is designed for models that learn from streaming data incrementally, including anomaly detection, drift detection, and time series forecasting.

That combination makes InfluxDB 3 and River a perfect match. These plugins bring small, per-series, constantly-updating models directly into the write path, then write the resulting profiles, anomalies, and forecasts back into InfluxDB tables where they can be queried like any other time series data or combined with other triggers and plugins to kick off informed actions. Better yet, they do this all within InfluxDB, eliminating the need for extra infrastructure, servers, and data pipelines to fuel your ML models. In this blog, we’re going to talk about all three new River plugins for InfluxDB, and how they can not only simplify your ML stack, but lead to faster insights through online modeling.

River Anomaly Detector: multiple ways to detect problematic data

Let’s start with the River Anomaly Detector. At a high level, the plugin monitors numeric fields and writes anomaly rows to the table _anomalies.{source_table}. It supports rolling Z-score detection, seasonal detection, and adaptive window (ADWIN)-based drift detection. Each unique combination of table, tags, and field has its own detector state, and models are updated incrementally as new observations arrive.

The rolling detector tracks an exponentially weighted mean and variance, then flags values that exceed the learned range by a configurable number of standard deviations. The seasonal detector maintains separate time-based buckets, either 24 hour-of-day buckets or 168 hour-of-week buckets, and it compares new values against the bucket that matches the current timestamp. The ADWIN detector watches for changes in the statistical properties of the stream, which is useful when the issue is a behavior shift rather than a single spike.

Detectors learn from every observation, and you can choose to enable one, two, or all three detector modes, but only the detectors active in the current mode can vote on whether a point is anomalous. This allows you to train the various forms of anomaly detection, then modify your trigger once enough data has been provided to the model. This way, a stream can accumulate enough seasonal history before the seasonal detector is allowed to participate in decisions, and ADWIN can keep learning even when the current mode is using a simpler rolling Z-score path. You don’t even need to specify which detector to use; run the plugin with a minimal trigger such as the following:

influxdb3 install package river

influxdb3 create trigger \
 --database mydb \
 --plugin-filename gh:influxdata/river_anomaly_detector/river_anomaly_detector.py \
 --trigger-spec "all_tables" \
 anomaly_detector

With that trigger in place, the plugin starts monitoring numeric fields in all incoming tables. The default behavior is to “auto-tune” and read recommendations from the River Auto Profiling plugin, which we’ll dive into more in the next section. A typical query against the output might look like this:

SELECT
 time,
 host,
 field_name,
 original_value,
 detector_mode,
 rolling_mean,
 rolling_std,
 rolling_deviation,
 rolling_threshold,
 seasonal_bucket,
 drift_detected
FROM "_anomalies.cpu"
ORDER BY time DESC
LIMIT 20;

That output is deliberately detailed: you get the original value, the active detector mode, rolling statistics, seasonal statistics when available, and ADWIN drift information. The plugin only writes rows where a datapoint is determined to be an anomaly, so the anomaly table is an event stream rather than a full copy of the source data.

For a stable metric, the detector may run with a lower Z-score threshold because small deviations are meaningful. For a noisy metric, it may use a higher threshold to avoid producing noise. For a metric with a strong weekly pattern, seasonal buckets can separate “expected at 2 p.m. on Monday” from “expected in general.” For a metric that is drifting, ADWIN can participate so the detector can respond to changes in the stream rather than treating every shift as a one-off outlier.

You can also make the mode explicit. For example, if you only care about a specific table and you know it has strong daily or weekly seasonality on a couple fields you want to monitor, you can specify this and force the plugin to only use seasonal detection:

influxdb3 create trigger \
 --database mydb \
 --plugin-filename gh:influxdata/river_anomaly_detector/river_anomaly_detector.py \
 --trigger-spec "table:cpu" \
 --trigger-arguments \
   'include_fields=temperature humidity' \
   'rolling_std_threshold=3.0' \
   'enable_seasonal=true' \
 anomaly_detector_cpu

The detector can encode understanding in the database itself, close to the data, while still leaving room for explicit configuration when you know exactly what you want. Whether you want adaptive detection, rolling Z-scores, seasonal detection, or some combination of the three, the River Anomaly Detector plugin can make it happen, and it’s truly as simple as defining the tables and fields you want it to operate on.

River Auto-Profiler: the control plane for per-series tuning

The Auto-Profiler is the least flashy of the three plugins, but it is the one that makes the anomaly detector more practical at scale.

A static threshold is easy to understand and easy to deploy. It is also usually wrong somewhere. A metric that barely moves, a metric with high variance, a metric with a weekly pattern, and a metric undergoing a slow drift should not all use the same anomaly detection settings. You can tune each field by hand, but that does not scale well when your schema grows or when the behavior of a series changes over time.

The Auto-Profiler addresses that by incrementally profiling each numeric series and writing recommendations to _meta.series_profiles. It tracks exponentially weighted mean and variance, skewness, kurtosis, write interval, and seasonal variance buckets. After a short calibration phase, it writes profile snapshots that include a pattern label, recommended detector mode, threshold, fading factors, seasonality strength, trend strength, and maturity flags.

Creating the trigger is intentionally simple:

influxdb3 create trigger \
 --database mydb \
 --plugin-filename gh:influxdata/river_auto_profiler/river_auto_profiler.py \
 --trigger-spec "all_tables" \
 auto_profiler

Then you can inspect the current profile state:

SELECT
 time,
 source_table,
 host,
 field_name,
 observations,
 pattern_label,
 recommended_detector_mode,
 recommended_threshold,
 recommended_fading_factor,
 seasonality_strength,
 trend_strength,
 profile_mature,
 seasonality_ready
FROM "_meta.series_profiles"
ORDER BY time DESC
LIMIT 20;

The classifier is intentionally transparent. It labels streams as stable, noisy, trending, seasonal, or bursty and maps those labels to detector modes such as zscore_low, zscore_high, zscore_adaptive, zscore_conservative seasonal, or zscore_conservative adwin. Seasonality detection uses hourly or weekly variance buckets, and trend detection compares fast and slow exponentially weighted means.

That transparency is useful. If a profile says a series is seasonal, you can look at the seasonality_strength and seasonal_buckets_filled fields. If it says a series is trending, you can inspect trend_strength. If the profile is not mature yet, downstream consumers can fall back to safer defaults.

The Auto-Profiler also tunes thresholds using observed exceedance behavior. It tracks how often observations fall outside mean ± threshold × std and adjusts the threshold toward an approximate target anomaly rate, with bounds to avoid unbounded sensitivity changes. That does not make anomaly detection “automatic” in the magical sense. It makes it adaptive in the operational sense: the system can use what it has learned about each series to pick a more appropriate starting point.

The most important integration is with the anomaly detector. With both plugins deployed, the profiler writes per-series recommendations, and the anomaly detector consumes those recommendations to adapt to your data, weed out anomalies, and minimize noise.

River Forecaster: short-horizon forecasts from the write stream

The River Forecaster takes the same online-learning pattern and applies it to forecasting. The plugin uses River’s SNARIMAX model, an online ARIMA-style model, to learn from incoming values and periodically write multi-step forecasts to _forecasts.{source_table}. Like the anomaly detector, it keeps a separate model per table/tag/field series. Unlike the anomaly detector, it requires explicit table selection through include_tables, which is a necessary guardrail for forecast cardinality.

A trigger for the forecasting plugin might look like this:

influxdb3 create trigger \
 --database mydb \
 --plugin-filename gh:influxdata/river_forecaster/river_forecaster.py \
 --trigger-spec "all_tables" \
 --trigger-arguments \
   "include_tables=system_cpu system_memory" \
   "include_fields=idle used available" \
   "max_series=100" \
   "default_horizon=24" \
   "log_forecasts=true" \
 forecaster

By default, the model waits for a warm-up period before producing forecasts. The forecast horizon is also derived from the stream. The plugin tracks each model’s write interval and uses it to choose how many future steps are needed to cover a target forecast window, which defaults to one hour. If a series is written every 10 seconds, the forecast horizon can be much longer than a series written every five minutes, because the time window is the same, but the step size is different.

You can query forecasts like this:

SELECT
 time,
 host,
 field_name,
 horizon_step,
 forecast_value,
 horizon_total,
 observations
FROM "_forecasts.system_cpu"
ORDER BY time DESC
LIMIT 12;

Each forecast row receives a future timestamp based on the last observed timestamp plus horizon_step * write_interval_seconds. The output includes the forecasted value, the step number, total horizon length, and the number of observations the model has learned from.

The forecaster also self-throttles per model. It produces a forecast, stores those predictions in memory, and then compares them step by step against incoming actuals. A new forecast is produced once the previous forecast horizon has been consumed, so evaluation covers the full horizon rather than a partial subset.

That makes the plugin useful for near-term operational questions. For the example above (forecasting CPU and system memory utilization), these questions might be:

Is memory usage expected to cross a threshold in the next hour?
Is a sensor trending toward a range that usually precedes maintenance?
Is the current CPU-idle forecast consistent with what the anomaly detector is seeing?

These are not the only forecasting questions you might ask of time series data, but they are the kind that fit well in a write-triggered architecture.

Using the River plugins with InfluxDB

With the simplicity of InfluxDB 3’s embedded Processing Engine, getting started with notoriously difficult or complicated ML tasks is now simpler, faster, and easier than ever. For all three of these River ML plugins, you can view the code and README files within the InfluxDB 3 plugin repository, with full documentation for configuration and recommended use. Installing them is as simple as defining and starting the triggers within the processing engine.

Make sure you’re intentional about which tables and fields you monitor. You should think about cold starts, profile maturity, and whether a given field has enough regularity for short-horizon forecasting. These questions exist whenever you’re deploying ML techniques on data—this isn’t magic.

Once you have the plugins up and running, everything should work seamlessly as described above, adapting and learning from the data written to your tables. All three plugins’ outputs remain in InfluxDB. Profiles are written to _meta.series_profiles. Any detected anomalies are written to _anomalies.{source_table}. Forecasts are written to _forecasts.{source_table}.

That means the operational burden is small, and you can query profiles, anomalies, forecasts, and raw data with SQL. You can visualize your River ML-generated data in dashboards alongside the raw and processed data you’ve written into InfluxDB, join different datasets together, and inspect behavior, anomalies, and forecasts without needing to involve any other services or platforms. You don’t need additional hardware or pipelines or self-written plugins to extract your time series data into an offline system for ML. It may just be the easiest way to leverage your time series data for machine learning out there. By embedding these plugins within InfluxDB, your training workflows and ML insights can now live closer to the raw data than ever before.

Getting Started with Home Assistant Webhooks & Writing to InfluxDB

Cole Bowden (InfluxData) — Tue, 28 Apr 2026 08:00:00 +0000

If you’re already running or are familiar with Home Assistant, you’ve likely worked with integrations, maybe a few automations, and possibly MQTT as a way to wire devices together. But webhooks add another layer of flexibility that lets you level up your smart home into a fully-customized, intelligent network. Instead of relying on built-in integrations and being confined to the same local network, you can let external devices and services push events directly into Home Assistant. This gives you a simple way to build custom flows: a device sends a webhook, Home Assistant receives it, and then you decide what happens next. It’s a lightweight way to connect systems, even when built-in integrations may be lacking.

Once you have the webhook flow in place, the next question is what to do with the data generated from your webhook calls, where to store it, and how to best leverage it. That’s where InfluxDB fits in. It’s built specifically for time series data, which means it’s designed to handle continuous streams of time-stamped events like the ones generated by a smart home using Home Assistant. Instead of just reacting in the moment, you can store that data, query it, and build a clearer picture of how your system behaves. Data processing and forecasting builds an even more advanced understanding of your system over time.

In this blog, we’ll walk through both sides of that setup. First, we’ll use webhooks in Home Assistant to create flexible, event-driven flows between devices and services. Then we’ll connect that stream of data to InfluxDB and its Processing Engine so you can go beyond real-time reactions and start working with your data in a more structured way.

What is Home Assistant?

Home Assistant is an open source platform that ties all your smart home devices together in one place. It runs locally, gives you control over how devices interact, and lets you build automations based on events happening throughout your home. Instead of relying on separate apps or cloud services for each device, everything feeds into a single system where you can define your own logic. That can be as simple as turning on lights at sunset or as involved as coordinating and controlling multiple devices based on sensor data, schedules, forecasts, and external inputs.

It’s easy to get started with Home Assistant by connecting a few common integrations. Nearly all smart lights, thermostats, and motion sensors have existing integrations, and building simple automations on those integrations, like having lights turn on if a motion sensor detects movement, is straightforward from there. As your setup grows, you can layer in more conditions, tie multiple devices together, and start building routines.

At some point, though, you may want to bring in data or events from devices and services that don’t have a native integration. That’s where webhooks come in. They give you a simple way to send events directly into Home Assistant from anything that can make an HTTP request, which opens the door to more custom, event-driven flows without needing to build a full integration.

Setting Up a Home Assistant Webhook

To get started on the Home Assistant side of things, a webhook is just another type of trigger. This means you can create it as you would any other trigger type: navigate to automations, create an automation, and add a webhook trigger. Home Assistant has documentation on exactly how this trigger works. You must define a webhook ID when you create a webhook trigger, and you’ll need to include that ID when you invoke the webhook. Just like with MQTT triggers in Home Assistant, webhook triggers also support payloads that contain additional data, and you can use this payload in downstream automation if desired.

For testing purposes, make sure that a downstream action is invoked by the trigger. Using one of your other devices connected to Home Assistant is often the most straightforward option, whether that’s switching a light on/off or sending a push notification to an Apple device via iCloud.

Then, to invoke your trigger, simply call your webhook. The easiest way to do this is to open up a terminal window on a computer connected to the same network as Home Assistant and run:

curl -X POST -d 'key=value' https://"your-home-assistant":8123/api/webhook/"id"

Any other means of sending an HTTP POST request will work fine. Note that you’ll need to replace "id" with the webhook ID that you defined when you created the trigger and "your-home-assistant" with the local IP of the device running Home Assistant. The ‘key=value’ is where you can provide your payload. If you want multiple keys and values, you can separate them with &, or you can provide it in a JSON format, which is covered in the Home Assistant documentation.

If you want to send HTTP requests from devices or servers that aren’t on your home network, you’ll need to make sure you set the local_only option to “false” and port forward the port Home Assistant uses for webhooks, which is 8123 by default. Home Assistant’s documentation recommends some security practices that are worth repeating: because allowing external traffic to invoke the webhook trigger is inherently insecure, make sure that any downstream actions can’t be destructive or problematic if a bad actor sends a request.

Full-Stack Example: Energy Price Monitoring

Suppose you want to monitor energy prices on the grid and use those prices to inform when you should turn certain devices in your smart home on or off.

You’ll need to start with a script to monitor grid pricing. Depending on where you live and how your electricity is billed, you may be able to simply query your utility or fetch the relevant information periodically from a website. Run a small server or device that can handle this task, and schedule it with cron to run periodically. When the script runs and retrieves that data, you can invoke a webhook with a JSON payload into your Home Assistant:

import requests

WEBHOOK_URL = "https://192.168.1.20:8123/api/webhook/electricity_price"
PRICE_THRESHOLD_KWH = 0.20

# fetch local electricity prices, then...

payload = {
    "price_per_kwh": current_electricity_price,
    "threshold": PRICE_THRESHOLD_KWH,
}
response = requests.post(
    WEBHOOK_URL,
    json=payload,
    timeout=10,
)
response.raise_for_status()

Then, in Home Assistant, your trigger could be set up as:

alias: Energy price spike response
description: Adjust to eco mode when electricity prices go above threshold

triggers:
  - trigger: webhook
    webhook_id: energy_price_monitor
    allowed_methods:
      - POST
    local_only: false

conditions:
  - condition: template
    value_template: >
      {{ trigger.json.price_per_kwh | float >= trigger.json.threshold | float }}

actions:
 - action: switch.turn_off
    target:
      entity_id:
        - switch.ev_charger
        - switch.garage_ac

With a scheduled Python script and the Home Assistant trigger, you can now run a scheduled task to check the web, invoke the trigger, pass in relevant data as a payload, and have other devices connected to Home Assistant take necessary actions. The above example demonstrates switching off some devices when electricity prices are high, but a few minor adjustments could instead turn devices on when prices drop.

Adding more intelligence to your smart home with InfluxDB

Webhooks and automation are a good start, but there’s still much more you can do. Data is being collected and used to trigger various events around the house, but what do you do with that data after it’s used to set off a trigger? If you’re turning off EV charging and auxiliary air conditioning when electricity is particularly pricey, what impact is that having?

Fortunately, Home Assistant has an integration with InfluxDB that can help you take your system from smart home to smarter home with minimal setup. Install InfluxDB, add the Home Assistant integration for InfluxDB, then configure the authentication to an existing InfluxDB instance. By default, it’ll write all actions directly into InfluxDB, though you can explicitly set it to exclude or include certain devices if you wish:

influxdb:
  api_version: 2
  ssl: false
  host: 192.168.1.50
  port: 8181
  token: "YOUR_INFLUXDB_TOKEN"
  organization: home
  bucket: home_assistant

To write the data from the earlier webhook script into InfluxDB, we can use the InfluxDB 3 Python client:

from influxdb_client_3 import InfluxDBClient3, Point
import requests

WEBHOOK_URL = "https://192.168.1.20:8123/api/webhook/electricity_price"
PRICE_THRESHOLD_KWH = 0.20

INFLUXDB_URL = "192.168.1.50:8181"
INFLUXDB_TOKEN = "your_influxdb_token"
INFLUXDB_DATABASE = "home"

def main():
    client = InfluxDBClient3(
        host=INFLUXDB_HOST,
        token=INFLUXDB_TOKEN,
        database=INFLUXDB_DATABASE,
    )

    # fetch local electricity prices, then...

    write_to_influx(current_electricity_price)
    post_request_to_home_assistant(current_electricity_price)

def post_request_to_home_assistant(price):
    payload = {
        "price_per_kwh": price,
        "threshold": PRICE_THRESHOLD_KWH,
    }
    response = requests.post(
        WEBHOOK_URL,
        json=payload,
        timeout=10,
    )
    response.raise_for_status()

def write_to_influx(price):
    point = (
        Point("grid_prices")
        .field("price_per_kwh", float(price))
    )
    client.write(point)

With all the data for triggers and actions, you can retain a long-term memory of what your smart home is doing. With the InfluxDB Processing Engine, you can do further analysis and processing of data as it’s written.

To continue with the example above, you could connect your electricity grid up to Home Assistant, then persist the meter data into InfluxDB. That data, combined with records of when your webhook trigger wrote information about current electricity prices, could allow you to see how your home adapts in real-time to fluctuations in grid prices. If everything is set up correctly, you should see that spikes in electricity prices lead to lower utilization, and vice versa.

Better yet, you could use the Prophet forecasting plugin, trained on the same data, to create a smart home that isn’t just reactive but predictive. By persisting smart home data to InfluxDB, you can train models on that data to make intelligent predictions. For example, you could forecast electricity prices relatively easily. First, create an instance of the forecasting plugin:

influxdb3 create trigger \
  --database home \
  --path "gh:influxdata/prophet_forecasting/prophet_forecasting.py" \
  --trigger-spec "every:1h" \
  --trigger-arguments "measurement=grid_prices,field=price_per_kwh,window=30d,forecast_horizont=12h,target_measurement=grid_price_forecast,model_mode=train,unique_suffix=home_prices_v1,seasonality_mode=additive,inferred_freq=1H" \
  grid_price_forecast

Then enable it:

influxdb3 enable trigger \
  --database home \
  grid_price_forecast

With forecasting enabled, there’s now a grid_price_forecast table that will be populated, which you can query to view predicted spikes in prices. You can use those predicted spikes to run critical tasks around the house before electricity spikes, rather than simply shutting them off after it increases.

Continual improvement

If you’ve followed along with every part of this blog, you should have a full loop in place. A small service watches something outside your home, sends a periodic signal, Home Assistant handles the local response, and InfluxDB keeps a record of what happened so you can look back and improve it. None of the individual pieces are especially complicated, but putting them together gives you something more useful than a single automation. You’re building a system that can learn from its own behavior and get smarter over time.

Get started with InfluxDB 3 and its Home Assistant integration today.

Setting Up an MQTT Data Pipeline with InfluxDB

Cole Bowden (InfluxData) — Fri, 17 Apr 2026 08:00:00 +0000

In this blog, we’re going to take a look at how you can set up a fully-functioning, robust data pipeline to centralize your data into an InfluxDB instance by collecting and sending messages with the MQTT protocol. We’ll start with a brief overview of the technologies and protocols used in the pipeline, then dive into how you can connect, configure, and test them to ensure your data pipeline is fully functional. It’s going to be a long post, so let’s jump right in.

What is MQTT?

MQTT is an industry-standard, lightweight protocol for moving messages through a network of devices. It functions by having a broker, or multiple brokers, receive messages from individual devices (publishing clients) across the network, and publish those messages to external systems (destination clients) that are connected and listening to the broker. By categorizing messages into “topics,” systems that subscribe to specific topics can opt to receive only messages they’re interested in.

As a lightweight protocol with a number of prominent open source implementations, MQTT is an industry standard for a variety of use cases. It’s particularly common in Internet of Things (IoT) and Industrial IoT (IIoT) applications, but can be leveraged anywhere you have a distributed network of devices generating data or messages. This includes fleet management, home automation, real-time telemetry on computer hardware, and practically any use case where sensors generate data points periodically.

Why use InfluxDB for MQTT data?

If you’ve already concluded that the MQTT protocol is the right way to move your data from various devices into a centralized broker, odds are that you’re working with time series data. Time series data has a couple of key characteristics: it’s a sequence of data collected in chronological order, and all data points contain a timestamp. Most commonly, this also means there’s a large volume of data. Hundreds or thousands of sensors generating new data points every second can quickly turn into millions or billions of records per day. As the scale of data increases, the need for a specialized, purpose-built solution to handle this volume grows, too.

That’s where InfluxDB, the industry-leading time series database, comes in. InfluxDB is purpose-built for the time series data common in MQTT use case scenarios, delivering unparalleled performance and a number of dedicated features to make managing and working with your time series data as easy as possible.

Performance is critical because ingesting millions or billions of data points per day can strain most databases. Because time series databases like InfluxDB are optimized to handle that firehose of continuous data, they can scale to handle and ingest it with greater efficiency and lower costs. A custom-built storage engine eliminates snags that most other types of databases encounter, such as index maintenance and contention locks. Last-value caches and engine optimizations for timestamp-based filtering makes retrieving recent data extremely efficient, so fresh data being written into InfluxDB can be queried in less than 10 milliseconds, minimizing time to insight (or as we like to call it, “time to awesome”). This ensures a real-time view of the data generated across your network of devices.

Time series functionality also makes managing and working with this data much easier, regardless of if performance at scale is a concern. DataFusion, the SQL query engine embedded into InfluxDB 3, makes it easy to query with a language most data professionals and AI agents already know. With dedicated time-based functions, queries that look like this in a general purpose database:

WITH hours AS (
  SELECT generate_series(
    date_trunc('hour', now() - interval '24 hours'),
    date_trunc('hour', now()),
    interval '1 hour'
  ) AS hour_bucket
),
sensors AS (
  SELECT DISTINCT sensor_id FROM sensor_data
),
hour_sensor AS (
  SELECT h.hour_bucket, s.sensor_id
  FROM hours h
  CROSS JOIN sensors s
),
agg AS (
  SELECT
    sensor_id,
    date_trunc('hour', time) AS hour_bucket,
    percentile_cont(0.95) WITHIN GROUP (ORDER BY temperature) AS p95
  FROM sensor_data
  WHERE time >= now() - interval '24 hours'
  GROUP BY sensor_id, hour_bucket
)
SELECT
  hs.hour_bucket,
  hs.sensor_id,
  COALESCE(a.p95, 0) AS p95
FROM hour_sensor hs
LEFT JOIN agg a USING (hour_bucket, sensor_id)
ORDER BY hs.sensor_id, hs.hour_bucket;

Can be shortened to this in InfluxDB:

SELECT
  date_bin_gapfill(INTERVAL '1 hour', time) AS hour,
  sensor_id,
  interpolate(percentile(temperature, 95)) AS p95
FROM sensor_data
WHERE time >= NOW() - INTERVAL '24 hours'
GROUP BY hour, sensor_id;

Admittedly, this is a cherry-picked example for a complicated function most users won’t use every day, but there are plenty that aren’t. The InfluxDB 3 processing engine comes with a host of built-in plugins for processing and transforming data as it’s written, monitoring and anomaly detection, forecasting, and alerting. Retention policies can be set at a database or table level, ensuring you keep data as long as it’s useful, and the downsampling plugin for the processing engine can help you keep your data at a lower resolution once it’s past the end of that policy. InfluxDB also has tons of connections to the ecosystem of data visualization tools, clients, and, critical for the purposes of this tutorial, integrates seamlessly with Telegraf, the data collection agent we’ll be using to move data from our MQTT broker into InfluxDB.

The MQTT -> InfluxDB pipeline

The architecture of this data pipeline is relatively straightforward, with data flowing in one direction throughout:

Devices, sensors, and anything generating raw data are set up as an MQTT publishing client connected to the broker.
The MQTT broker receives the raw data from the various publishers and forwards it.
Telegraf subscribes to the published topics and then writes data into InfluxDB.
The InfluxDB processing engine handles all necessary transformations and makes the data immediately available for querying and visualization.

So let’s jump into specifics.

Setting Up the MQTT Broker and Clients

The first thing you’re going to need to do is install the MQTT technology of your choice on every device that’s going to be a publishing client, as well as on the server you want to act as your broker. Eclipse Mosquitto is a common open source option for MQTT that we’ll use in this guide, but any other MQTT client, such as HiveMQ, Paho, MQTTX, MQTT Explorer, or EasyMQTT, will also work great for this tutorial. The exact commands will differ depending on what you’re using, but the concepts will remain the same, as it’s a standardized protocol.

To install Eclipse Mosquitto:

On Linux, run: snap install mosquitto
On Mac: Install Homebrew, then run brew install mosquitto
On Windows: Go to the mosquitto download page and install from there

When you install Mosquitto, the installer will then tell you the exact file path that the configuration file sits in. You’ll want to configure your broker first, and you should set up authentication if you don’t want to allow unauthenticated connections. A lack of authentication can be fine if you’re running everything on a local network where you’re not doing any port forwarding, but it’s not recommended if your devices are communicating over the internet.

There are many different ways to set up authentication with Mosquitto—one of the simplest is creating a password file with the mosquitto-passwd command, but you can read a full list of options on their documentation page for authentication methods. Whatever you settle on, if you decide to use some form of authentication, you’ll need to add the following line to your Mosquitto configuration file.:

allow_anonymous false

There are many other configuration options in the documentation, and what you set and configure will depend on your use case, but some you may want to consider are:

persistence false - Because we’re writing to InfluxDB, we don’t need to persist messages to disk.
log_dest stdout - For setting up, testing, and debugging, outputting logs directly to the terminal makes things easier.

And of course, make sure your listener is configured on the same port for all devices. The default is 1883, but you can change this if desired.

Once you configure your broker, you can set up your publishing clients, and with whatever data you’re measuring, they can publish messages to the broker with the command:

mosquitto_pub -h "host" -t "topic" -m "value"

If you’re running this all on a local network, your host will be localhost; otherwise, it’ll be the address where your broker is running. The value should be whatever you’re measuring and publishing at that moment.

Your topic can be whatever is appropriate to label that value. If you have different devices and different types of measurements for each device, it’s recommended to nest your topics and organize them in a way that makes logical sense. For example, if you have many different devices measuring, say, temperature and velocity, your topic arrangement may look like:

/sensors/vehicles/v1/device1/temp
/sensors/vehicles/v1/device1/velocity
/sensors/vehicles/v1/device2/temp
/sensors/vehicles/v1/device2/velocity

As long as you have a unique topic structure for each type of value being sent, we can parse and sort this into tags and fields with InfluxDB. For further information on setting up MQTT topics, there are plenty of great guides on the matter.

With your clients and broker configured, your clients publishing messages, and your broker receiving and forwarding those messages, you should be all set up for the MQTT portion of this data pipeline.

Installing InfluxDB

The next step is to move your MQTT data into InfluxDB. The first step is to install InfluxDB. You can check out our docs on installing it here, but the simplest and easiest way to get started is to run the install scripts provided by InfluxData with:

curl -O https://www.influxdata.com/d/install_influxdb3.sh \
&& sh install_influxdb3.sh

These should work on every operating system and provide you with some simple options to get started with InfluxDB 3 Core or Enterprise. The installation script should also give you an admin token, which you’ll want to store somewhere safe so you can use it for authentication. If you’d like to further configure your InfluxDB 3 instance, the installation script should tell you where all files and configuration files were installed for further adjusting, though it should run fine out of the box.

If you have Docker installed, you can also install the InfluxDB Explorer UI as part of this process, giving you an easy way to view, manage, and query your InfluxDB 3 instance. You can reach it by navigating to localhost:8888 in your browser, entering host.docker.internal:8181 for the server address, and providing the admin token.

Installing and Configuring Telegraf

With InfluxDB 3 installed and running, the last step to get the data pipeline operational is to install and configure Telegraf to connect our MQTT broker to InfluxDB. Telegraf installation varies by operating system and Linux distribution, so check out the Telegraf documentation on installation to find the right files or command to run.

If you’re on Mac or Linux, this will generate a default configuration file for you:

On Mac, install via Homebrew: /usr/local/etc/telegraf.conf
On Linux: /etc/telegraf/telegraf.conf

Otherwise, you’ll need to create an empty configuration file or generate one with telegraf config > telegraf.conf. Once you have located or created your configuration file, all that’s left to do is connect Telegraf to your MQTT Broker and InfluxDB.

InfluxDB is very easy to configure a connection to, and you can add these lines to the config file:

[[outputs.influxdb_v2]]
  urls = ["InfluxDB address & port"]
  token = "admin token"
  organization = "org name"
  bucket = "destination database"

The InfluxDB address and port should be wherever you have InfluxDB installed. If you’re running on a local network, this will be http://127.0.0.1:8181; otherwise, it’ll be the IP and port.
Token is the admin token you copied from installation.
Organization can be whatever you’d like to name it.
Bucket should be the name of the database you’re writing all your MQTT data to. You don’t have to create the database first.

Setting up a connection to your MQTT broker is also straightforward:

[[inputs.mqtt_consumer]]
  servers = ["broker address"]
  topics = ["list of topics"]
  data_format = "value"
  data_type = "data_type"

  ## if you have username and password authentication for MQTT
  username = "username"
  password = "password"

The broker address is one again the address and port for where your MQTT broker is running. For a local network, this will be tcp://127.0.0.1:1883
Topics is a comma-separated list of topics that you’re writing to.
Data type is the primitive data type being written: integer, float, long, string, or boolean.

This is all you need in your configuration file to have the full pipeline running! If you run telegraf with telegraf --config telegraf.conf, you should be able to send a message from an MQTT publisher and view that data in InfluxDB.

However, you can make some improvements in Telegraf’s configuration to help parse and organize your data by topic. By default, this writes each topic into a single tag column to the same table, with a monolithic “value” column for all your values, which isn’t a very good data model. With topic parsing and pivot processing added to the configuration, we can specify what part of the topic should define what table the data is written into, turn every level of the topic into a tag, and pivot on the last level of the topic so that each raw value is its own field:

[[inputs.mqtt_consumer]]
  servers = ["broker address"]
  topics = ["/sensors/#"]
  data_format = "value"
  data_type = "data_type"

  ## if you have username and password authentication for MQTT
  username = "username"
  password = "password"

  [[inputs.mqtt_consumer.topic_parsing]]
    measurement = "/measurement/_/_/_/_"
    tags = "/_/device_type/version/device_name/field"
  [[processors.pivot]]
    tag_key = "field"
    value_key = "value"

This takes a value from the /sensors/vehicles/v1/device1/temp topic and writes it to the sensors table. The tag columns populate with device_type = vehicles, version = v1, device_name = device1, and temp is written as a field with the value of temp set to whatever your MQTT publisher wrote. You can modify this configuration as appropriate for your topics, and the documentation provides full information on everything that can be done.

Further improvements

With MQTT data being published, parsed, and written into InfluxDB, you’ve fully set up an MQTT data pipeline! However, there’s a lot more you can do:

View and query your data with the InfluxDB Explorer UI, as discussed earlier.
Connect any one of the many client libraries to access your data and use it for downstream applications, or to a data visualization tool for dashboarding and insight into what’s being written.
Use the InfluxDB 3 processing engine for further transformations and processing of your data as it’s written.
Set up alerts, monitoring, forecasting, and more with the processing engine, too.

The final product

By integrating MQTT, Telegraf, and InfluxDB, you’ve constructed a robust, fully-functioning data pipeline capable of efficiently centralizing real-time telemetry. The lightweight MQTT protocol ensures that messages from your distributed network flow reliably to the broker, while Telegraf acts as the collection agent for seamless ingestion and transformation. Finally, InfluxDB provides the purpose-built storage and specialized features needed to query and visualize your data in minimal time. This architecture establishes a solid foundation for turning raw event streams into meaningful insights, minimizing your time to awesome.

Why Use a Purpose-Built Time Series Database

Cole Bowden (InfluxData) — Wed, 24 Dec 2025 08:00:00 +0000

A time series database has a straightforward definition: it’s a database purpose-built for efficiently ingesting, storing, and querying time series data. Time series data is any data with a timestamp, collected regularly or periodically, that you’ll often visualize on graphs where the X-axis is time. This definition doesn’t quite tell you what sets it apart from other types of databases, though. This blog is going to dive into the details of how various databases are architected and help you understand why you’d want to use a time series database to handle your relevant workloads.

A quick database history lesson

Databases have existed in some capacity since the 1960s, but the first true databases in the modern sense were relational, transactional databases like Ingres and the original SQL Server. These databases store and represent data in rows and columns, and the paradigms underpinning them still exist in their successors, transactional databases like Postgres and MySQL. You might see these databases described as handling OLTP (Online Transaction Processing) because they can quickly process and write new transactions to a dataset. They do this by writing each new row to storage as one cohesive unit at the end of the table in storage, and the database’s size has a negligible impact on how long it takes to add a new row, no matter how large it gets.

The downside is that when you analyze larger volumes of data, even a simple filter like WHERE userID = 1 in a query could require opening up every single row to check if the userID is or is not 1. Modern transactional databases have a number of strategies to speed up queries like this, the most prominent being indexing, which can make retrieving single rows efficient. However, as you scale queries up and try to retrieve more rows for analysis, performance slows down because the engine needs to read a lot of unnecessary data on disk just to find the data the query is looking for.

For scenarios where that became undesirable, the next logical step in database development was columnar databases, which store data in columns instead of rows. You’ll also see them described as handling OLAP (Online Analytical Processing). In a columnar database, data for a given column is colocated in storage next to other data in that column. When you start querying your data with filters, joins, aggregations, and more sophisticated analytical logic, storing data in columns has a massive upside. The WHERE userID = 1 example from above only requires looking at the userID column. If the column is indexed, it should be sorted, allowing the engine to quickly find exactly the rows the user is looking for and then retrieve the other columns’ data for those rows.

The downside is that when you write a row, you have to find the “end” of each column and add that row’s data to each column, which is a slower process. It also means that when you go to retrieve a single row, you need to pull that row’s data out of each column in storage, which is also a little slower. With indexing and a wide variety of other optimizations, this isn’t a big deal if you’re writing a single row, but as write volume and frequency increase, you can hit bottlenecks, and read performance will suffer—sometimes greatly.

NoSQL & time series databases

As data has grown exponentially and use cases have become increasingly varied, the need for more specialized databases has caused even more divergence. Google released the non-relational (often said as NoSQL) database BigTable in 2005, using key-value storage rather than rows and columns. Since then, countless NoSQL databases, including prominent names like MongoDB, Cassandra, and Neo4j, have emerged. These databases can perform a wide variety of functions, though many have the performance characteristics of OLTP databases: designed to handle and write large volumes of data, but for more specialized use cases, whether that involves more powerful and flexible schemas or modeling the data in more intuitive ways.

Some NoSQL databases use columnar storage to achieve efficient analytical query performance while not being strictly relational underneath the hood to better handle performance for their specific use case. The most prominent example of this happens to be our favorite: InfluxDB.

What time series databases do differently

Every database is built for a purpose, and time series databases are no different. Because they’re purpose-built to handle time series workloads, they’re able to do a lot of things that general-purpose transactional and analytical databases can’t.

In many general use cases, a single missing datapoint can be catastrophic. This means general-purpose databases have to have many checks in place to ensure that every write is handled correctly, with no interference from other writes and zero risk of data loss. With time series data, a single reading from a sensor that’s making 7200 measurements per hour likely isn’t mission-critical, and one missing datapoint won’t impact a user’s ability to monitor trends.

By not committing to full ACID compliance in favor of eventual consistency, a time series database can avoid tricky snags like contention locks. This paradigm shift enables huge improvements in write performance, allowing them to significantly outpace even highly performant OLTP databases in write throughput. The same paradigm shift also allows a time series database to make new data available in real-time for querying. There is a tradeoff between performance and durability. Many databases have to absolutely maximize durability for scenarios where a single row represents important business data. Time series databases can confidently sacrifice a small amount of durability for a massive gain in performance.

As another example, nearly all general-purpose databases use indexing to speed up their query performance. Even with sparse indexing, a strategy where only certain values are indexed, increasing cardinality (the number of unique values) can hamper performance. When the number of unique indexed values is so large that loading the index into memory takes up a meaningful portion of available memory, query performance suffers. If the index uses all of the memory, the database falls over.

InfluxDB 3 solves this problem by eschewing indexes. By storing relevant data in columnar storage, there are no concerns about running low on memory due to high cardinality creating bloated indexes. Though a lack of indexing would cause a hit to performance in a general use case, Influx partitions and stores data on disk sorted by timestamp with numerous optimizations for time-based queries, so performance doesn’t suffer. Any queries that filter for certain time periods can efficiently prune all unnecessary data, and thanks to a number of time-based optimizations, an aggregation over a large time period is faster than it would be in a standard database. The lack of indexing allows InfluxDB to work with unlimited cardinality, which in turn means InfluxDB can store and query datasets with unlimited unique values. This allows InfluxDB to store UUIDs, IP addresses, time series data enriched with relational data, and more. Because of the many optimizations for time-based queries, what would be a pitfall instead accelerates performance.

Time-based functionality

The specific purpose of time series databases also means that they come with more powerful functionality and syntax for navigating time-based queries.

Take this example query written for Postgres which tries to find the 95th percentile for temperature measured by sensors, and which has gap filling for any hours where data may be missing:

WITH hours AS (
  SELECT generate_series(
    date_trunc('hour', now() - interval '24 hours'),
    date_trunc('hour', now()),
    interval '1 hour'
  ) AS hour_bucket
),
sensors AS (
  SELECT DISTINCT sensor_id FROM sensor_data
),
hour_sensor AS (
  SELECT h.hour_bucket, s.sensor_id
  FROM hours h
  CROSS JOIN sensors s
),
agg AS (
  SELECT
    sensor_id,
    date_trunc('hour', time) AS hour_bucket,
    percentile_cont(0.95) WITHIN GROUP (ORDER BY temperature) AS p95
  FROM sensor_data
  WHERE time >= now() - interval '24 hours'
  GROUP BY sensor_id, hour_bucket
)
SELECT
  hs.hour_bucket,
  hs.sensor_id,
  COALESCE(a.p95, 0) AS p95
FROM hour_sensor hs
LEFT JOIN agg a USING (hour_bucket, sensor_id)
ORDER BY hs.sensor_id, hs.hour_bucket;

In InfluxDB 3, the same query can be expressed as:

SELECT
  date_bin_gapfill(INTERVAL '1 hour', time) AS hour,
  sensor_id,
  interpolate(percentile(temperature, 95)) AS p95
FROM sensor_data
WHERE time >= NOW() - INTERVAL '24 hours'
GROUP BY hour, sensor_id;

Nanosecond timestamp precision, a wide variety of date and time functions, and easy integrations via Telegraf with all of the common methods for collecting time series data make InfluxDB easy to use. InfluxDB 3 makes recent writes immediately available for real-time analysis by sending them to a queryable, in-memory buffer—something ACID-compliant databases can’t do. Built-in tools for data lifecycle management with retention policies and easy downsampling also make it easy to save big on storage costs.

The tradeoff to all of these upsides is that a time series database is a specific tool that isn’t right for every use case. If any single row of data represents a piece of mission-critical business information, you shouldn’t use a time series database. If you have a small volume of data that a simple Postgres installation can handle well, you don’t need a time series database. If your data doesn’t have timestamps, you can’t use a time series database.

The takeaway

For scenarios where you’re collecting and analyzing large volumes of data over time, a time series database like InfluxDB is purpose-built to be the best tool for the job. Write throughput is unparalleled, datasets with high cardinality pose no issue, queries and analytics are highly-performant, and you have a full suite of tools to make it easier to work with time series data. When a time series database is right for your workloads, no other type of database can compare. Get started with InfluxDB 3 Core or Enterprise for free today.

Getting Started with InfluxDB 3 Core: From Installation to First Query in 10 Minutes

Cole Bowden (InfluxData) — Tue, 11 Nov 2025 08:00:00 +0000

Getting started with any database technology can be daunting, and nothing is ever as easy as a snap of the fingers. With InfluxDB 3, we’ve made it as painless as possible. If you want to do some testing, development, or exploration, you’ve read the title: you should be up and running in under 10 minutes with very little hassle. This blog walks you through a couple different ways to download and get started with Core, including both quick start and the Docker container, and then guides you through loading data and running queries with the UI and CLI.

Option 1: Install & run the simple download

The fastest way to get started with InfluxDB 3 is to use a provided shell script. You can download and run the install script from the command line:

curl -O https://www.influxdata.com/d/install_influxdb3.sh \
&& sh install_influxdb3.sh

When prompted in your terminal window, enter 2 to select the simple download, and then 1 to select quick start.

Once you’ve done that, you can validate it’s installed with influxdb3 --version, then run the command influxdb3.

If this doesn’t work, the terminal output lists next steps: the first step is to make sure the Influxdb3 command is included in your shell’s configuration file with source. Follow that instruction, and if the file doesn’t exist, run the touch command on that file before rerunning sh install_influxdb3.sh.

The following steps—generating an admin token and writing data—are covered in detail in this blog, so you can skip past the next section (installing with Docker) for more details.

Option 2: Install & run with Docker

Unsurprisingly, the prerequisite for using the Docker image is to have Docker installed. If you’re on a personal computer, it’s recommended to use Docker desktop. If you’re on a Linux server or a remote instance where you’re only interacting via the command line, then you’ll want to use Docker engine. Make sure one of the above is installed on your system.

Once you have Docker installed, open a terminal to download the latest InfluxDB 3 Core Docker image:

docker pull influxdb:3-core

Once the download is complete, you can start the Docker image with the following command:

docker run -it -p 8181:8181 --name influxdb3-container \
      --volume ~/.influxdb3_data:/.data --volume ~/.influxdb3_plugins:/plugins 
influxdb:3-core \
      influxdb3

Note that this command names the container “influxdb3-container,” and we’re still using the built-in Quick Start to start InfluxDB with the simple influxdb3 command.

A note on quick start

Using the quick start and the default configuration options that come with it is the simplest way to get started. This will set your node ID to {hostname}-node, use the file object store, and store data in the ~/.influxdb directory. For production deployments and other circumstances where you want to configure how Influx runs, use the serve command and specify configuration options, including explicitly setting your node ID and object store.

Generate an admin token

Whether you’ve done the quick download or used Docker, you’re now running InfluxDB 3 Core. The next step is to generate an admin token to connect to and authenticate with the InfluxDB server. Open a new terminal window, and…

If you used the quick start, run:

influxdb3 create token --admin

If you used Docker, run:

docker exec -it influxdb3-container influxdb3 create token --admin

This will output a token and an HTTP requests header that can be used to connect to your container. Make sure to save both of these somewhere safe—ideally with password management software of some kind. You can create other types of tokens, but here we’re generating an admin token to keep things simple and ensure we have all permissions for further testing and development.

Connect, load, & query data

With the server running and a token generated to connect to it, the last thing we need to do is actually connect.

Option 1: Explorer UI

For this guide, we use the Explorer UI, which is the simplest way to connect to InfluxDB. The UI is only available via Docker, so if you used the simple download earlier and don’t have Docker installed yet, you’ll need it.

The UI comes in a separate Docker image, so to get started, open a new terminal window and download the latest version from Docker:

docker pull influxdata/influxdb3-ui

Then run the Docker container for Explorer:

docker run \
  --name influxdb3-explorer \
  --publish 8888:80 \
  --publish 8889:8888 \
  influxdata/influxdb3-ui \
  --mode=admin

Once it’s connected, open your web browser (e.g., Chrome) and enter localhost:8888 into the address bar—this should load the UI:

Click on “configure server” towards the top right, then click on “connect your first server.” Enter a name of your choosing for your server (e.g., “Test Server”). Enter either localhost:8181 or host.docker.internal:8181 for your server URL, depending on whether you used the simple download or Docker container. Then enter the admin token you generated and copied earlier as your token, which should start with apiv3_.

Once you fill out all the fields, click “Add Server.” When you navigate back to System Overview in the UI, you should see that you’re connected to the server that you’re running on the machine. Your UI is up and running, so it’s time to write data.

To do so, navigate to the “Write Data” -> “Sample/Dev Data” dropdown on the left menu of the UI. This guide uses the noaa-weather sample dataset, but you can choose other samples or load your own CSV/JSON data. To load a sample dataset, simply click on it, then click “Write Sample Data.” Instructions for loading CSV/JSON data are provided in the UI if you want to use data of your own.

Once data is written, it’s time to query. Navigate to “Query Data” -> “Data Explorer” on the left menu of the UI, select the schema you loaded data into from the dropdown, and then you can start querying. To explore the schema, you can start with something simple, such as querying all records from within the past day:

SELECT * FROM weather
WHERE time >= now() - INTERVAL '1 day';

If you want to get a little more complicated, a query to check locations where weather hasn’t been sunny in the last day might look like:

SELECT location, condition FROM weather 
WHERE 
  time >= now() - INTERVAL '1 day' 
  AND condition "" 'sunny' 
ORDER BY location;

Just like that, you’ve started up InfluxDB 3 Core, loaded data, and queried that data:

Option 2: CLI

If you don’t want to install Docker or you’re using a machine without a UI, the CLI is the key way to write and query data. If you’re running the InfluxDB server in a Docker container, you’ll need to prefix your commands with docker exec -it influxdb3-container. Aside from that, it’ll work the same either way.

Use the write command to load a sample dataset (different from the weather dataset above), replacing "AUTH_TOKEN" with the token you generated earlier. Note that this will automatically create a database named “weather” as well.

influxdb3 write \
  --token "AUTH_TOKEN" \
  --database weather \
 "$(curl --request GET https://docs.influxdata.com/downloads/bay-area-weather.lp)"

If you want to load your own data, you can use the line protocol to format and load any time-series dataset. Once it’s loaded, you can run a query command to read the data:

influxdb3 query \
  --database weather \
  --token "AUTH_TOKEN" \
  "SELECT * FROM weather WHERE time "= '2020-01-01T00:00:00' AND time "= '2020-01-05T00:00:00';"

Once you run that, you should see query results:

Next steps

Now that you’ve started up InfluxDB, loaded and queried data, there are a few next steps to explore. If you want to configure your InfluxDB server, make sure to use the serve command the next time you start it up. You can set up Telegraf to load real data into your Influx server. Or you can connect a data visualization tool like Grafana to do more than run one-off queries on your data. If you need help, have questions, or want to share feedback, you can join the InfluxDB Discord.