InfluxDB 3.0 Python Client Update: Adding Polars Support

Navigate to:

It’s been a while since we posted about the InfluxDB 3.0 Python Client. Let’s take a look at what’s new!

Polars Dataframe ingest

2023 saw the popularity of a new kid on the block within the data analytics space, Polars. The Polars Data Frame library is an alternative data frame package to the original OG Pandas. Although both cater to the same use cases, each is fundamentally built on different technologies.


One of the most frequent community requests we received was to provide greater compatibility with Polars. Since Polars is built on Apache Arrow, we extended the mode function to include polars. Simply query the data and modify the mode to polars as below:

import polars as pl
from influxdb_client_3 import InfluxDBClient3

with InfluxDBClient3(
    token="",
    host="eu-central-1-1.aws.cloud2.influxdata.com",
    org="6a841c0c08328fb1") as client:

        sql = 'SELECT * FROM caught LIMIT 100000'
        df = client.query(database="pokemon-codex", query=sql, language='sql', mode='polars')
        print(df, flush=True)

We call the Polars function from_arrow() within the underlying client code. This automatically converts our Arrow table into a Polars Dataframe. Note: you must install the Polars Dataframe library to use this mode.

Ingestion was a slightly different story. Like V1 and V2, InfluxDB V3 expects line protocol (LP) as its primary ingestion method. This means we build out converters to LP in our client libraries. Polars provides an extremely efficient UDF feature, which made creating this new converter straightforward to implement. We built the new Polars data frame converter into the preexisting data frame converter class. Here is an example:

import polars as pl
from influxdb_client_3 import InfluxDBClient3,InfluxDBError,WriteOptions,write_client_options

class BatchingCallback(object):

    def success(self, conf, data: str):
        print(f"Written batch: {conf}, data: {data}")

    def error(self, conf, data: str, exception: InfluxDBError):
        print(f"Cannot write batch: {conf}, data: {data} due: {exception}")

    def retry(self, conf, data: str, exception: InfluxDBError):
        print(f"Retryable error occurs for batch: {conf}, data: {data} retry: {exception}")

callback = BatchingCallback()

write_options = WriteOptions(batch_size=10000,
                                        flush_interval=10_000,
                                        jitter_interval=2_000,
                                        retry_interval=5_000,
                                        max_retries=10,
                                        max_retry_delay=15_000,
                                        exponential_base=2, max_close_wait=900_000)

wco = write_client_options(success_callback=callback.success,
                          error_callback=callback.error,
                          retry_callback=callback.retry,
                          WriteOptions=write_options 
                        )

client = InfluxDBClient3(
    token="token",
    host="eu-central-1-1.aws.cloud2.influxdata.com",
    org="6a841c0c08328fb1", enable_gzip=True, write_client_options=wco)

pl_df =pl.read_parquet('pokemon_100_000.parquet')

client.write(database="pokemon-codex", 
             record=pl_df, data_frame_measurement_name='caught', 
             data_frame_tag_columns=['trainer', 'id', 'num'], 
             data_frame_timestamp_column='timestamp')

client.close()

In this case, you can see it includes the same parameters we would use when writing a Pandas data frame. We distinguish the data frame type and call the right converter within the write API.

Top Tip: Make sure to include the data_frame_timestamp_column= and specify your timestamp column. Polars does not provide an index method like Pandas, so we cannot automatically distinguish which is the correct column.

Custom Arrow Flight headers

Another requested feature was the inclusion of custom Arrow Flight Call Options for queries. This allows users familiar with Arrow Flight to use the underlying configuration parameters. A simple example could be increasing the timeout for a particular query:

df = client.query(database="pokemon-codex", query=sql, language='sql', mode='polars', timeout=5)

Bugs and miscellaneous

Lastly, here is a minor change history list:

Version Change
0.3.4 / 0.3.3 Merged V2 Write API into V3 and removed the V2 client library as a dependency.
0.3.4 / 0.3.3 Added custom port declaration for clustered users
0.3.2 Fixed Pandas as an optional dependency issue
0.3.1 Added flight errors readme
0.3.1 Added community and cookbook example
0.3.0 Added custom certificates parameter. (To fix Windows-based gRPC SSL issue)

What’s next?

We hope you find the new features added to the InfluxDB 3.0 Python Client library useful. If you have any feature requests or bugs to report, please do not hesitate to open an issue via the client repo. We are always looking for community contributors for our 3.0 client libraries. You can always discuss your contribution with us on Slack.