Getting Started with InfluxDB and Pandas

InfluxData prides itself on prioritizing developer happiness. A large part of maintaining developer happiness is providing client libraries that allow users to interact with the database through the language and library of their choosing. Data analysis is the task most broadly associated with Python use cases, accounting for 58% of Python tasks, so it makes sense that Pandas is the second most popular library for Python users. The 2.0 InfluxDB Python Client Data supports Pandas DataFrames to invite those data scientists to use InfluxDB with ease.

In this tutorial, we’ll learn how to query our InfluxDB instance and return the data as a DataFrame. We’ll also explore some data science resources that exist as a part of the Client repo. To learn about how to get started with the InfluxDB Python Client Library, please take a look at this blog.

Pandas InfluxDB
Me eagerly consuming Pandas and InfluxDB Documentation. Photo by Sid Balachandran on Unsplash.

Data science resources

A variety of data science resources have been included in the InfluxDB Python Client repo to help you take advantage of the Pandas functionality of the client. I encourage you to take a look at the example notebooks. They are a collection of Jupyter Notebooks providing examples with a variety of time series data science and analytics solutions, e.g. how to integrate Tensorflow and Keras for predictions.

From InfluxDB to a DataFrame

Import the client and Pandas:

from influxdb_client import InfluxDBClient
import pandas as pd

Supply auth parameters:

my_token = my-token
my_org = "my-org"
bucket = "system"

Write your Flux query:

query= '''
from(bucket: "system")
|> range(start:-5m, stop: now())
|> filter(fn: (r) => r._measurement == "cpu")
|> filter(fn: (r) => r._field == "usage_user")
|> filter(fn: (r) => r.cpu == "cpu-total")'''

Query InfluxDB and return a Dataframe:

client = InfluxDBClient(url="http://localhost:9999", token=my_token, org=my_org, debug=False)
system_stats = client.query_api().query_data_frame(org=my_org, query=query)
display(system_stats.head())

From DataFrame to InfluxDB

Import the script (not part of the client):

from dataframe_to_line_protocol import lp

From DataFrame to InfluxDB

 

Convert DataFrame into line protocol:

lines = lp(system_stats,"_measurement","cpu","_field","_value","_time")

From DataFrame to line protocoll

 

Write points to InfluxDB:

from influxdb_client import InfluxDBClient, Point, WriteOptions
from influxdb_client.client.write_api import SYNCHRONOUS

_write_client = client.write_api(write_options=WriteOptions(batch_size=1000, 
                                                            flush_interval=10_000,
                                                            jitter_interval=2_000,
                                                            retry_interval=5_000))

_write_client.write(bucket, my_org, lines)

Close client:

_write_client.__del__()
client.__del__()

The full script that accompanies this tutorial can be found here.

Pandas to complement Flux with InfluxDB

Although Flux has many of the data transformation capabilities that Pandas does, InfluxDB values developers’ time. If you are dealing with a smaller dataset, you might not have much incentive to do those transformations on the server side or learn Flux. Hopefully this Pandas capability can help you execute your time series analysis faster. As always, if you run into hurdles, please share them on our community site or Slack channel. We’d love to get your feedback and help you with any problems you run into.

0 thoughts on “Getting Started with InfluxDB and Pandas”

  1. Thank you Anais.
    It was conspicuous and informative.
    Since influx 2.0 is not available for Windows. I will try this on Linux.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top