TL;DR Python, Pandas Dataframes, and InfluxDB
By Susannah Brodnitz / Nov 09, 2022 / InfluxDB, Community
InfluxDB has over a dozen client libraries so developers can get started more easily and program in the language they’re most comfortable with. One of our most popular options is the Python client library. InfluxDB supports not just Python but pandas, a tool popular with data scientists for analyzing and manipulating data. You can use the client library to output data from InfluxDB into a DataFrame format pandas can ingest, and you can write pandas DataFrames directly to InfluxDB. This makes it simple to incorporate InfluxDB into your data science applications and take advantage of real-time monitoring and alerting.
Querying data from InfluxDB to a Pandas Dataframe
To query and return a pandas DataFrame in InfluxDB you first need to install the Python client library. To set it up you need to download that library into your Python project and set your credentials, including your URL, token, and the name of the bucket you want to send data to and read data from. Once you’ve set up your credentials you can query data.
Here’s an example query:
query= ‘’’ from (bucket: “system”) |> range(start: -5m, stop: now()) |> filter(fn: (r) => r._measurement = “cpu”) |> pivot(rowKey: [“_time”], ColumnKey: [“tag”], ValueColumn: “value”) |> keep(columns: [“_time”, “usage_user”, “cpu”]) ‘’’ client = InfluxDBClient(url, token, org, debug = false) system_stats = client.query.api().query_data_frame() display(system_stats.head())
This query is written in Flux, InfluxDB’s scripting and query language. It asks for data from the bucket “system” from the past five minutes and filters on the measurement “cpu.” The two most important functions to keep in mind when you’re working with InfluxDB and pandas are
Pivot changes the data format to a column. Pandas dataframes are two-dimensional tables so pandas functions and tools expect data to be in that format and you’ll need to pivot the data you query from InfluxDB.
Keep tells the query what data to keep in this column and gets rid of any other columns you don’t want to include in the output pandas DataFrame.
After you’ve written your query, you need to call the InfluxDB client with your credentials and use the query API with the
query_data_frame() function to get the DataFrame format. Then you can display the data inside of the pandas DataFrame with
Writing data from a Pandas Dataframe to InfluxDB
You can also write data from a pandas dataframe into InfluxDB. Here’s some example code that does that:
system_stats.drop(columns = [‘result’, ‘table’, ‘start’, ‘stop’]) system_stats.set_index(“_time”) _write_client.write(bucket.name, record = system_stats)
Any data that you don’t remove using the
drop function from your pandas DataFrame becomes a field when you write the data to InfluxDB. In this example query we’ve removed columns we don’t want to send as fields to InfluxDB because we want to make them tags. You also need to set your index, which will be your timestamp in InfluxDB. Then you need to use the Python client library
write function to write the data to InfluxDB. It takes in a bucket name and the record you want to write to that bucket. You also need to set up credentials for the client, as in the first example.
One thing to consider is the timestamp. Pandas comes with a datetime format. You can use that or you can use one of the following time formats:
“2018-10-26”, “2018-10-26 12:00”, or “2018-10-26 12:00:00-05:00”
Flux is a powerful analytics language, but you might want to use pandas because it’s very popular within the Python community. It has a lot of support and many people are already familiar with it and comfortable working with DataFrames. Combining pandas with the InfluxDB Python client library lets you easily incorporate InfluxDB into your applications to get the benefits of a purpose-built time series database.
To learn more about using pandas DataFrames with InfluxDB you can watch our Meet the Developer video on this topic.