Forecasting with InfluxDB 3 and HuggingFace

Navigate to:

Machine learning models must do more than make accurate predictions; they also need to adapt as the world around them changes. In real-world systems, data distributions shift due to seasonality, equipment wear, user behavior changes, or other external forces. If your models can’t keep up, the result is poor predictions. This can lead to outages, inefficiencies, or missed opportunities. That’s why forecasting systems need to be monitored and resilient, not just accurate. In this post, we’ll walk through a full-stack ML demo that shows exactly how to do that. We combine ML model monitoring with tooling to forecast time series data, detect when models start to drift, and automatically retrain them in response.

For example, in an industrial IoT setting, a predictive maintenance model might learn to detect motor failure based on temperature and vibration data. But over time, the equipment wears down, operators change procedures, or sensors begin to degrade, causing the model’s assumptions about “normal” to drift. If the model isn’t updated, it may miss early failure warnings or raise false alarms, leading to costly downtime or unnecessary maintenance. In finance, a trading model might rely on patterns that no longer hold true after market volatility shifts, leading to poor decisions if it isn’t retrained. These are the kinds of scenarios where detecting and responding to drift in real time becomes critical.

The demo uses a PyTorch-based LSTM model for forecasting, InfluxDB 3 for storing time series data and model metrics, and Hugging Face Hub for cloud-based model storage and versioning. It’s all wrapped in a Streamlit app so you can interactively explore the pipeline, from synthetic data generation to drift-aware retraining. InfluxDB, a purpose-built time series database (TSDB), is particularly well suited for this kind of pipeline. Time series workloads are naturally indexed by time and benefit from TSDB features like fast temporal queries, downsampling, retention policies, and streaming ingestion. By storing everything from raw inputs to forecasts and retraining events in InfluxDB, you get efficient storage and retrieval and a transparent audit trail of your model’s behavior over time.

The pipeline follows a modular ML lifecycle: generating sine wave-based time series data, writing that data into InfluxDB 3 Core (open source), and training an initial model. The model is then saved to Hugging Face. You can then simulate concept drift by injecting noise or offsets, and the system will automatically detect degraded performance using MSRE and MSE thresholds. When drift is detected, the pipeline allows you to trigger a retraining process, uploads the new model to Hugging Face, and logs the event for traceability. Every stage, from data generation to initial training, forecasts, drift detection, and retraining events, is captured in InfluxDB 3 Core to ensure full visibility of the model’s health over time.

For this demo, we’re using InfluxDB 3 Core, the open source version for recent data. It’s ideal for development, prototyping, and recent time series workloads. However, if you plan to scale this pipeline and include high availability, scalability, enhanced security, and long-term storage, then InfluxDB 3 Enterprise is a better fit. Luckily, switching between products is easy. InfluxDB 3 Enterprise is a superset of InfluxDB 3 Core, and migration happens in-place (for this project, it’s as simple as pulling the Enterprise docker image instead of the Core image).

This project was built with Replit; you can find the corresponding project repo here. A screenshot from the Streamlit application showing the difference between the original time series and the drifted data.

Requirements

To run this project locally, you’ll need the following:

  • Python 3.11+: For model training, inference, and app execution
  • InfluxDB: To store your time series data
  • Docker: Used to spin up a local instance of InfluxDB 3 Core
  • Hugging Face Account: To upload and download models from your repository
  • PyTorch: Required libraries for LSTM model implementation and persistence
  • Streamlit: To power the interactive UI that orchestrates the pipeline

Quick start

1. Clone the repository and set up the environment.
git clone https://github.com/yourusername/lstm-forecasting-drift-detection.git
cd lstm-forecasting-drift-detection
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
pip install -r requirements.txt
2. Start InfluxDB 3 Core with Docker.

Note: We are mounting our plugin directory here to prepare for this project’s next evolution, which utilizes the InfluxDB 3 Python Processing Engine.

docker run -it --rm --name influxdb3core \
  -v ~/influxdb3/data:/var/lib/influxdb3 \
  -v ~/influxdb3_huggingface_forecasting_monitoringplugins:/plugins \
  -p 8181:8181 \
  quay.io/influxdb/influxdb3-core:latest serve \
  --node-id my_host \
  --object-store file \
  --data-dir /var/lib/influxdb3
  --plugin-dir /plugins
3. Create the database and auth token for InfluxDB 3 Core.
docker exec -it influxdb3core influxdb3 database create --name timeseries
docker exec -it influxdb3core influxdb3 auth create --name my-token
4. Set up environment variables. Create a .env file in the project root.
env
INFLUXDB_HOST=http://localhost:8181
INFLUXDB_TOKEN=your_influxdb_token_here
INFLUXDB_DATABASE=timeseries
HF_TOKEN=your_huggingface_token_here
HF_REPO_ID=your_username/your_repo_name
5. Run the app.
streamlit run [app.py](http://app.py)

The Streamlit dashboard will be live at http://localhost:5000. The Streamlit app: the first step runs you through configuring your InfluxDB instance. This will automatically be populated with environment variables, but you can choose to configure it through the UI instead.

App walkthrough

Tab 0 InfluxDB Configuration: First, we need to connect our application to an InfluxDB instance to store our time series data and model metrics. The InfluxDB Configuration tab makes this straightforward. The app will also automatically populate your Connection Settings with env variables if you provide them.

Tab 1 Data Generation: Now that we’re connected, let’s generate some synthetic time series data to work with. This tab allows us to create a clean sine wave with customizable parameters.

Using the sidebar sliders, you can adjust:

  • Number of data points: Controls how many data points to generate
  • Noise level: Adds random variation to the sine wave
  • Frequency factor: Changes how quickly the wave oscillates
  • Amplitude: Sets the height of the wave

Click Generate New Data, and the application will:

  • Create a synthetic time series based on your parameters
  • Split it into training and testing sets
  • Write it to InfluxDB (if connected)
  • Display a visualization of the generated data

Tab 2 Initial Model Training: Click Train LSTM Model to begin training. The application will:

  • Prepare sequences from your time series data
  • Scale the data for better convergence
  • Create and train an LSTM model with your chosen parameters
  • Display training progress and final metrics

The charts show:

  • Training and validation loss curves (to check for overfitting)
  • Model predictions on the test data
  • A detailed view of the last portion of predictions vs. actuals

The model accuracy metrics give you immediate feedback on how well your model learned the patterns in your data.

Pro Tip: If your training loss decreases but validation loss increases, you might be overfitting. Try increasing the dropout rate or reducing the number of LSTM units.

Tab 3 Drift Injection: Now comes the interesting part! In real-world scenarios, data patterns often change over time, causing models to become less accurate. This phenomenon is known as data drift. This tab lets you simulate drift to see how it affects model performance.

From the left-hand panel, you can configure:

  • Drift type: Choose between “offset” (a vertical shift) or “noise” (increased randomness)
  • Drift start point: When the drift should begin (as a percentage of the dataset)
  • Drift magnitude: How severe the drift should be

Click Inject Drift to apply these changes to your data. The application will:

  • Take your original data and add drift at the specified point
  • Store the drifted data in InfluxDB 3 Core
  • Show a comparison between the original and drifted data

The visualization clearly highlights where the drift begins and how it affects the data pattern. The red portion represents the drifted section. Behind the scenes, the drifted data is stored in the drifted_data measurement in InfluxDB 3 Core, preserving both the original and modified versions.

Tab 4 Drift Detection: Once drift is present in our data, we need a systematic way to detect it. This tab demonstrates using error metrics to identify when model performance deteriorates.

You can configure:

  • Drift metric: Choose between MSRE (Mean Squared Relative Error) or MSE (Mean Squared Error)
  • Drift threshold: The error value above which drift is considered detected
  • Window size: Number of points to include in each sliding window for error calculation

Click Detect Drift to analyze model performance on the drifted data. The application will:

  • Run the original model on drifted data
  • Calculate error metrics using sliding windows
  • Determine when/if drift is detected
  • Visualize error metrics over time

The chart shows error metrics with a horizontal line representing your threshold. When the metrics cross this line, the system flags drift detection. You’ll also see the exact window where drift was first detected.

Pro Tip: MSRE is often more sensitive to relative changes in pattern, while MSE is better for detecting absolute magnitude changes. Choose based on what’s more important for your use case.

Tab 5 Model Retraining: Once drift is detected, the appropriate response is usually to retrain the model on more recent data that includes the new patterns. This tab demonstrates how to retrain and compare model performance.

Click Retrain Model with Drifted Data to:

  • Create a new training dataset that includes drifted data
  • Train a new model with the same architecture but on updated data
  • Compare predictions from both the original and retrained models
  • Calculate improvement metrics

The visualization shows three lines:

  • Actual values (ground truth)
  • Predictions from the original model
  • Predictions from the retrained model

You can clearly see how the retrained model adapts to the new pattern while the original model continues to follow the old pattern. The metrics quantify this improvement in terms of reduced error.

Tab 6 Model Persistence: Our final tab handles saving and loading models to/from Hugging Face, enabling model versioning and sharing capabilities.

You’ll see fields for:

  • Hugging Face Repository ID: Where to store your models (format: “username/repo-name”)
  • Model Name: Identifier for this specific model

Click Save Model to Hugging Face to:

  • Serialize the current model (either original or retrained)
  • Upload it to your Hugging Face repository
  • Record metadata about the saved model

Code overview

The project is organized into modular utility scripts that mirror the structure of the ML pipeline.

  • data_generator.py handles synthetic data creation and drift injection, allowing you to simulate real-world scenarios with evolving patterns.
  • model.py defines the LSTM architecture, training loop, forecasting logic, and model persistence methods.
  • drift_detection.py implements statistical drift detection using MSRE and MSE metrics with support for sliding windows.
  • huggingface_utils.py manages saving and loading models from the Hugging Face Hub, enabling easy model versioning and sharing.
  • influxdb_utils.py handles all interactions with InfluxDB 3 Core, from writing training metrics to querying time series data.

Together, these scripts form a complete forecasting monitoring loop—each one loosely coupled, testable, and easy to extend.

Limitations and Python Processing Engine

While this demo provides a full walkthrough of forecasting, drift detection, and retraining, it’s currently designed as a static application. Each step runs once when manually triggered via the Streamlit UI. In real-world production systems, forecasting and monitoring need to happen continuously and autonomously. The natural next step for this project is to integrate with the InfluxDB 3 Python Processing Engine, which allows you to schedule these scripts to run on a user-defined schedule or triggered on request. This would enable automated forecasting, real-time drift monitoring, and hands-free retraining. All driven directly by new data arriving in InfluxDB. Moving from manual to scheduled execution is what transforms this project from an educational tool into a robust model monitoring observability pipeline. For example, you could generate new data every hour with a schedule trigger, monitor for drift in real-time with a WAL-flush trigger, trigger a model retrain when needed with a HTTP request trigger, and push updated models to Hugging Face—all without user intervention. This shift would turn the project from a static demo into a dynamic model monitoring pipeline.

Looking ahead: from static scripts to plugins

For example, you could easily create a Data Generator Schedule Trigger with the following code:

"""
Simple Sine Wave Generator for InfluxDB 3 Processing Engine

This plugin generates a sine wave with noise and writes it to InfluxDB.
"""

import numpy as np
from datetime import datetime, timedelta
def process_scheduled_call(influxdb3_local, call_time, args=None):
    """
    Generate a sine wave with noise and write it to InfluxDB.

    Parameters:
    -----------

    influxdb3_local : InfluxDB3Local
        Interface for interacting with InfluxDB
    call_time : str
        The time the scheduled call was made
    args : dict, optional
        Arguments for data generation:
        - measurement: Name of the measurement/table (default: 'synthetic_data')
        - periods: Number of data points to generate (default: 168)
        - amplitude: Sine wave amplitude (default: 10.0)
        - noise: Noise level (default: 0.5)
    """

    # Parse arguments with defaults
    args = args or {}
    measurement = args.get("measurement", "synthetic_data")
    periods = int(args.get("periods", "168"))
    amplitude = float(args.get("amplitude", "10.0"))
    noise = float(args.get("noise", "0.5"))

    influxdb3_local.info(f"Generating {periods} data points for measurement '{measurement}'")

    # Get current time and calculate start time (hourly data)
    end_time = datetime.now()
    start_time = end_time - timedelta(hours=periods)

    # Generate sine wave with noise
    for i in range(periods):
        # Calculate timestamp for this point
        timestamp = start_time + timedelta(hours=i)
        unix_nano = int(timestamp.timestamp() * 1e9)

        # Generate sine value (full cycle every 24 points)
        x = (i / 24) * 2 * np.pi
        value = amplitude * np.sin(x)

        # Add noise
        if noise > 0:`
            value += np.random.normal(0, noise)

        # Create and write point
        line = LineBuilder(measurement)
        line.tag("source", "generator")
        line.float64_field("value", value)
        line.time_ns(unix_nano)

        influxdb3_local.write(line)

    influxdb3_local.info(f"Successfully generated and wrote {periods} data points")

    # At the end of your process_scheduled_call function, before the return:
    stats = {
        "measurement": measurement,
        "points_generated": periods,
        "start_time": start_time.isoformat(),
        "end_time": end_time.isoformat(),
        "amplitude": amplitude,
        "noise_level": noise
    }
    influxdb3_local.info(f"Statistics: {stats}")
    return stats

This plugin is designed to run on a schedule within the InfluxDB 3 Python Processing Engine. Each time it’s triggered, it generates a configurable number of sine wave data points with added noise, timestamps them at hourly intervals, and writes them to a specified measurement in InfluxDB. The core logic lives in the process_scheduled_call() function, which is the required entry point for scheduled plugins. It uses the influxdb3_local interface provided by the Processing Engine to log messages and write data. Data points are formatted using the LineBuilder class to construct line protocol records with tags, fields, and nanosecond-precision timestamps. At the end, it logs a summary of what was written, making this plugin a useful tool for simulating time series data streams in automated workflows.

Then you would create with:

influxdb3 create trigger \
  --trigger-spec "every:1m" \
  --plugin-filename "plugins/data_generator.py" \
  --database timeseries \
  data_generator_trigger
```
And enable it with:
```bash
influxdb3 enable trigger  --database timeseries data_generator_trigger

It’s also worth mentioning that you don’t have to use the LineBuilder class to write data in an InfluxDB 3 Core or Enterprise Python Processing Plugin. You can use the InfluxDB 3 Python Client Library instead, especially in scheduled or on-request plugins. This approach can be more convenient when you’re writing larger datasets, such as full Pandas DataFrames, all at once. For example, when model drift is detected and a retraining step is triggered, it often makes sense to write the resulting metrics or retrained model outputs in batch rather than looping through row-by-row with LineBuilder. This should also make converting our existing scripts to an InfluxDB 3 Python Processing Engine pipeline easier since many already operate on DataFrames.

Conclusion

This project helps to start bridging the gap between machine learning and model monitoring. Combining InfluxDB 3 for time series storage and Hugging Face for model versioning shows how you can monitor and adapt models as data evolves over time. While the current app runs in a controlled, step-by-step environment, the architecture lays the groundwork for a fully automated system. With just a few additions, like scheduled plugins powered by the InfluxDB 3 Processing Engine, you can turn this demo into a more robust workflow for real-time forecasting. Whether you’re monitoring sensor data, energy usage, or financial trends, this approach gives you the tools to stay ahead of model drift and keep your predictions reliable.

I encourage you to look at the InfluxData/influxdb3_plugins as we add examples and plugins. Also, please contribute your own! To learn more about building your plugin, check out these resources:

Additionally, check out these resources for more information and examples of other alert plugins:

I invite you to contribute any plugin that you create. Check out our Getting Started Guide for Core and Enterprise, and share your feedback with our development team on Discord in the #influxdb3_core channel, Slack in the #influxdb3_core channel, or our Community Forums.