InfluxData prides itself on prioritizing developer happiness. A large part of maintaining developer happiness is providing client libraries that allow users to interact with the database through the language and library of their choosing. Data analysis is the task most broadly associated with Python use cases, accounting for 58% of Python tasks, so it makes sense that Pandas is the second most popular library for Python users. The 2.0 InfluxDB Python Client Data supports Pandas DataFrames to invite those data scientists to use InfluxDB with ease.
In this tutorial, we’ll learn how to query our InfluxDB instance and return the data as a DataFrame. We’ll also explore some data science resources that exist as a part of the Client repo. To learn about how to get started with the InfluxDB Python Client Library, please take a look at this blog.
Data science resources
A variety of data science resources have been included in the InfluxDB Python Client repo to help you take advantage of the Pandas functionality of the client. I encourage you to take a look at the example notebooks. They are a collection of Jupyter Notebooks providing examples with a variety of time series data science and analytics solutions, e.g. how to integrate Tensorflow and Keras for predictions.
From InfluxDB to a DataFrame
Import the client and Pandas:
from influxdb_client import InfluxDBClient import pandas as pd
Supply auth parameters:
my_token = my-token my_org = "my-org" bucket = "system"
Write your Flux query:
query= ''' from(bucket: "system") |> range(start:-5m, stop: now()) |> filter(fn: (r) => r._measurement == "cpu") |> filter(fn: (r) => r._field == "usage_user") |> filter(fn: (r) => r.cpu == "cpu-total")'''
Query InfluxDB and return a Dataframe:
client = InfluxDBClient(url="http://localhost:9999", token=my_token, org=my_org, debug=False) system_stats = client.query_api().query_data_frame(org=my_org, query=query) display(system_stats.head())
From DataFrame to InfluxDB
Import the script (not part of the client):
from dataframe_to_line_protocol import lp
Convert DataFrame into line protocol:
lines = lp(system_stats,"_measurement","cpu","_field","_value","_time")
Write points to InfluxDB:
from influxdb_client import InfluxDBClient, Point, WriteOptions from influxdb_client.client.write_api import SYNCHRONOUS _write_client = client.write_api(write_options=WriteOptions(batch_size=1000, flush_interval=10_000, jitter_interval=2_000, retry_interval=5_000)) _write_client.write(bucket, my_org, lines)
The full script that accompanies this tutorial can be found here.
Pandas to complement Flux with InfluxDB
Although Flux has many of the data transformation capabilities that Pandas does, InfluxDB values developers’ time. If you are dealing with a smaller dataset, you might not have much incentive to do those transformations on the server side or learn Flux. Hopefully this Pandas capability can help you execute your time series analysis faster. As always, if you run into hurdles, please share them on our community site or Slack channel. We’d love to get your feedback and help you with any problems you run into.