Resources for Tasks in InfluxDB 3.0

Navigate to:

If you’re an InfluxDB v2 user, you might be wondering what happened to the task engine in InfluxDB 3.0. The answer is that we removed it in order to support broader interoperability with other task tools. V3 enables users to leverage any existing ETL tool rather than being locked into the limited capabilities of the Flux task engine.

Additionally, InfluxDB 3.0 prioritizes query and write performance to enable you to query, transform, and write large datasets with confidence and ease. However, having more choices requires more initial decision-making. In this post, we’ll highlight some third-party ETL tools and describe the advantages of each. This isn’t designed to be an exhaustive comparison of every existing ETL tool. Rather, I’ll focus on tools that we have existing examples for.

Note: All of these approaches and tools use the InfluxDB v3 Python client library. This client library contains methods for querying and writing Pandas and Polars to simplify ETL processes and gives users access to the many Python libraries available for that workload.

Quix

Quix is a complete solution for building, deploying, and monitoring event-streaming applications using Kafka and Python. Quix is designed specifically for processing time series data and comes in both cloud and on-prem offerings. Its UI simplifies the processing, building, and maintenance of event streaming and ETL processes.

Some advantages of Quix include:

  • Plugins for querying and writing data from/to your InfluxDB v3 instance and integrating InfluxDB v3 into your Quix pipeline.
  • Quix can orchestrate any container.

Some resources for getting started with Quix and InfluxDB 3.0 include:

Mage.ai

Mage is an open source data pipeline tool for transforming and integrating data. In essence, it’s an open source alternative to Apache Airflow. It also contains a UI that simplifies the ETL creation process. Mage clearly documents how to deploy on AWS, Azure, DigitalOcean, and GCP with Terraform and Helm Charts.

To summarize, some of the advantages of using Mage include:

  • Mage is open source.
  • Mage has the following features:
    • Orchestration: schedule and manage data pipelines for observability
    • Notebook editor: interactive Python, SQL, and R editor for coding data pipelines
    • Data integration: synchronize data from 3rd-party sources with your internal destinations
    • Streaming: ingest and transform real-time data
    • dbt: build, run, and manage your dbt models with Mage

Some resources for getting started with Mage and InfluxDB 3.0 include:

AWS Fargate

AWS Fargate is a serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes (EKS). With Fargate, you can run containers without the need to provision, configure, or scale virtual machine clusters. It also enables flexible resource management and configuration. This allows you to fine-tune container performance, making Fargate ideal for complex data processing.

To summarize, some of the advantages of using AWS Fargate include:

  • Serverless Simplicity: Fargate abstracts the underlying infrastructure, allowing developers to deploy containers without worrying about provisioning, scaling, or managing EC2 instances.
  • Cost Efficiency: Fargate charges users based on the resources consumed by the containers, providing cost savings by eliminating the need to maintain idle EC2 instances.

Some resources for getting started with AWS Fargate and InfluxDB 3.0 include:

FaaS Tools

Function as a Service (FaaS) tools are event-driven, serverless computing platforms. Examples include AWS Lambda, Google Cloud Functions, and Azure Functions. Some advantages and considerations when using FaaS tooling are:

  • They let developers run code without provisioning or managing servers.
  • They include automatic scale-up.
  • They allow users to focus on developing complex analytics and data science logic. However, there is less granular control over the computing environment.
  • If a task is intermittent or has a variable load, FaaS won’t charge for idle compute resources. If the workload is consistent, Fargate (see below) might be the more cost-effective option. Similarly, Fargate might be optimal if your task has a long execution time (e.g., greater than 10 minutes).

While InfluxData has yet to create a list of PoCs with FaaS tooling, you’ll want to leverage the InfluxDB v3 Python client library to query, transform, and write your data. Here are some resources for getting started with the Python client library:

python
import polars as pl
from influxdb_client_3 import InfluxDBClient3

client = InfluxDBClient3(
    token="",
    host="eu-central-1-1.aws.cloud2.influxdata.com",
    org="6a841c0c08328fb1")

sql = 'SELECT * FROM caught LIMIT 10'
table = client.query(database="pokemon-codex", query=sql, language='sql', mode='all')
df = pl.from_arrow(table)

Questions?

I hope this post helps you jumpstart migrating your tasks to InfluxDB 3.0 and taking advantage of its increased interoperability, ETL, and data-pipelining-specific tools. Get started with InfluxDB Cloud 3.0 here. If you need help, please reach out via our community site or Slack channel.