Getting Started with InfluxDB and Pandas

Navigate to:

InfluxData prides itself on prioritizing developer happiness. A large part of maintaining developer happiness is providing client libraries that allow users to interact with the database through the language and library of their choosing. Data analysis is the task most broadly associated with Python use cases, accounting for 58% of Python tasks, so it makes sense that Pandas is the second most popular library for Python users. The 2.0 InfluxDB Python Client Data supports Pandas DataFrames to invite those data scientists to use InfluxDB with ease.

In this tutorial, we’ll learn how to query our InfluxDB instance and return the data as a DataFrame. We’ll also explore some data science resources that exist as a part of the Client repo. To learn about how to get started with the InfluxDB Python Client Library, please take a look at this blog.

Pandas InfluxDB

Me eagerly consuming Pandas and InfluxDB Documentation. Photo by Sid Balachandran on Unsplash.

Data science resources

A variety of data science resources have been included in the InfluxDB Python Client repo to help you take advantage of the Pandas functionality of the client. I encourage you to take a look at the example notebooks. They are a collection of Jupyter Notebooks providing examples with a variety of time series data science and analytics solutions, e.g. how to integrate Tensorflow and Keras for predictions.

From InfluxDB to a DataFrame

Import the client and Pandas:

from influxdb_client import InfluxDBClient
import pandas as pd

Supply auth parameters:

my_token = my-token
my_org = "my-org"
bucket = "system"

Write your Flux query:

query= '''
from(bucket: "system")
|> range(start:-5m, stop: now())
|> filter(fn: (r) => r._measurement == "cpu")
|> filter(fn: (r) => r._field == "usage_user")
|> filter(fn: (r) => r.cpu == "cpu-total")'''

Query InfluxDB and return a Dataframe:

client = InfluxDBClient(url="http://localhost:9999", token=my_token, org=my_org, debug=False)
system_stats = client.query_api().query_data_frame(org=my_org, query=query)
display(system_stats.head())

From DataFrame to InfluxDB

Write DataFrame to InfluxDB::

From DataFrame to InfluxDB

from influxdb_client import InfluxDBClient, Point, WriteOptions
from influxdb_client.client.write_api import SYNCHRONOUS
# Preparing Dataframe: 
system_stats.drop(columns=['result', 'table','start','stop'])
# DataFrame must have the timestamp column as an index for the client. 
system_stats.set_index("_time")
_write_client.write(bucket.name, record=system_stats, data_frame_measurement_name='cpu',
                    data_frame_tag_columns=['cpu'])

Close client:

_write_client.__del__()
client.__del__()

Pandas to complement Flux with InfluxDB

Although Flux has many of the data transformation capabilities that Pandas does, InfluxDB values developers’ time. If you are dealing with a smaller dataset, you might not have much incentive to do those transformations on the server side or learn Flux. Hopefully this Pandas capability can help you execute your time series analysis faster. As always, if you run into hurdles, please share them on our community site or Slack channel. We’d love to get your feedback and help you with any problems you run into.