Client Library Deep Dive: Python (Part 2)

Navigate to:

InfluxDB-3-Python

Working with the new InfluxDB 3.0 Python CLI and Client Library

Okay, we are back for Part 2! Last time we discussed the new community Python library for InfluxDB 3.0. If you missed it, you can also watch it in video form.


Now for Part 2, let’s talk about a bolt-on application that uses the client library as the core of its development, the InfluxDB 3.0 Python CLI.

Python CLI

Okay, so following the same format as before, what were the reasons for building the CLI? Well, there are two primary reasons:

  1. We wanted to give users a data browsing tool that leveraged the new Flight endpoint. Python gave us the opportunity to prototype fast before we invested work in a more robust CLI offering. It also allowed us to leverage some interesting data manipulation libraries that could extend the scope of the Python CLI.

  2. We wanted a robust way to test the newly created InfluxDB 3.0 Python Client library, as you will see most of the tooling and functionality in use.

Install

Let’s talk about the installation process because, I must admit, Python doesn’t provide the most user-friendly packaging and deployment methods unless you use it daily. I recommend installing the CLI in a Python Virtual Environment first for test purposes:

$ python3 -m venv ./.venv
$ source .venv/bin/activate
$ pip install –upgrade pip
$ pip install influxdb3-python-cli

This set of commands creates our Virtual Python Environment, activates it, updates our Python package installer, and finally installs the new CLI.

If you would like to graduate from a Python Virtual Environment and move the CLI to your path, you can do so with a sudo install (You have to be careful here not to cause permission issues with packages):

sudo python3 -m pip install influxdb3-python-cli

Creating a CLI config

The first thing you want to do is create a connection config. This feature acts like the current InfluxDB influx CLI by saving your connection credentials for InfluxDB to use later.

influx3 create config \
--name="poke-dex" \
--database="pokemon-codex" \
--host="us-east-1-1.aws.cloud2.influxdata.com" \
--token="<your token>" \
--org="<your org ID>"
–name Name to describe your connection config. This must be unique.
–token This provides authentication for the client to read and write from InfluxDB Cloud Serverless or Dedicated. Note: you need a token with read and write authentication if you wish to use both features.
–host InfluxDB host — this should only be the domain without the protocol (https://)
–org Cloud Serverless still requires a user’s organization ID for writing data to 3.0. Dedicated users can just use an arbitrary string.
–database The database you wish to query from and write to.

Config commands

Config commands also exist to activate, update, delete, and list current active configs:

influx3.py config update --name="poke-dex" --host="new-host.com"
The update subcommand updates an existing configuration. The --name parameter is required to specify which configuration to update. All other parameters (--host, --token, --database, --org, --active) are optional.
influx3.py config use --name="poke-dex"
The use subcommand sets a specific configuration as the active one. The --name parameter is required to specify which configuration to use.
influx3.py config delete --name="poke-dex"
The delete subcommand deletes a configuration. The --name parameter is required to specify which configuration to delete.
influx3.py config list
The list subcommand lists all the configurations.

Writing and querying

You can use the CLI to either directly call the application, followed by the commands you wish to run, or run it through an interactive REPL. I personally believe the REPL approach provides a better flow, so let’s demo some of the features.

Once you created your config you simply enter the following to activate the REPL:

influx3

Which leads to:

influx3
InfluxDB 3.0 CLI.

(>)

Query

Let’s first take a look at the query options. Within the REPL you have 3 query options: SQL, InfluxQL, and chatGPT (more on this later). Let’s drop into the SQL REPL and run a basic query against the Trainer data we generated in the previous blog:

InfluxDB 3.0 CLI.

(>) sql
(sql >) SELECT * FROM caught

Now I wouldn’t normally recommend querying without some form of time-based WHERE clause, but I wanted to highlight how the CLI can handle large datasets. It uses mode = chunk from the Python Client Library to break large datasets into manageable Arrow batches. From there we have three options.

  1. We can either hit TAB to see the next portion of data, if one exists.

  2. Press F to save the current Arrow batch to a file type of our choosing (JSON, CSV, Parquet, ORC, Feather).

  3. Press CTRL-C to return back to the SQL REPL.

Let’s take a look at the option 2:

| 3961 |       82 | Venusaur                  |        83 |   80 | 0003 |      12 |     7 |      80 | 2023-07-06 13:41:36.588000 | ash       | Grass    | Poison   |
| 3962 |       64 | Dratini                   |        45 |   41 | 0147 |       6 |     7 |      50 | 2023-07-06 14:30:32.519000 | jessie    | Dragon   |          |

Press TAB to fetch next chunk of data, or F to save current chunk to a file
Enter the file name with full path (e.g. /home/user/sample.json): ~/Desktop/all-trainer-data.csv
Data saved to ~/Desktop/all-trainer-data.csv.

Here is a sample of the CSV file created:

"attack","caught","defense","hp","id","level","num","speed","time","trainer","type1","type2"
49,"Bulbasaur",49,45,"0001",12,"1",45,2023-07-06 14:30:41.886000000,"ash","Grass","Poison"
62,"Ivysaur",63,60,"0002",7,"1",60,2023-07-06 14:30:32.519000000,"ash","Grass","Poison"
62,"Ivysaur",63,60,"0002",8,"1",60,2023-07-06 14:30:38.519000000,"ash","Grass","Poison"

Once we reach the end of our dataset, it prompts us to press ENTER to drop back into the SQL REPL. Just remember if you feel like you’re pressing TAB forever, you can always drop out of the query with CTRL-C.

Now, let’s look at a more interesting example with the InfluxQL REPL:

(sql >) exit
(>) influxql
(influxql >) SELECT count(caught) FROM caught WHERE time > now() - 2d GROUP BY trainer
|    | iox::measurement   | time                | trainer   |   count |
|---:|:-------------------|:--------------------|:----------|--------:|
|  0 | caught             | 1970-01-01 00:00:00 | ash       |     625 |
|  1 | caught             | 1970-01-01 00:00:00 | brock     |     673 |
|  2 | caught             | 1970-01-01 00:00:00 | gary      |     645 |
|  3 | caught             | 1970-01-01 00:00:00 | james     |     664 |
|  4 | caught             | 1970-01-01 00:00:00 | jessie    |     663 |
|  5 | caught             | 1970-01-01 00:00:00 | misty     |     693 |

(influxql >) SELECT count(caught) FROM caught WHERE time > now() - 2d  GROUP BY time(1d),trainer ORDER BY time
|    | iox::measurement   | time                | trainer   |   count |
|---:|:-------------------|:--------------------|:----------|--------:|
|  0 | caught             | 2023-07-05 00:00:00 | ash       |     nan |
|  1 | caught             | 2023-07-06 00:00:00 | ash       |     625 |
|  2 | caught             | 2023-07-07 00:00:00 | ash       |     148 |
|  3 | caught             | 2023-07-05 00:00:00 | brock     |     nan |
|  4 | caught             | 2023-07-06 00:00:00 | brock     |     673 |
|  5 | caught             | 2023-07-07 00:00:00 | brock     |     180 |
|  6 | caught             | 2023-07-05 00:00:00 | gary      |     nan |
|  7 | caught             | 2023-07-06 00:00:00 | gary      |     645 |
|  8 | caught             | 2023-07-07 00:00:00 | gary      |     155 |
|  9 | caught             | 2023-07-05 00:00:00 | james     |     nan |
| 10 | caught             | 2023-07-06 00:00:00 | james     |     664 |
| 11 | caught             | 2023-07-07 00:00:00 | james     |     157 |
| 12 | caught             | 2023-07-05 00:00:00 | jessie    |     nan |
| 13 | caught             | 2023-07-06 00:00:00 | jessie    |     663 |
| 14 | caught             | 2023-07-07 00:00:00 | jessie    |     144 |
| 15 | caught             | 2023-07-05 00:00:00 | misty     |     nan |
| 16 | caught             | 2023-07-06 00:00:00 | misty     |     693 |
| 17 | caught             | 2023-07-07 00:00:00 | misty     |     178 |

We will save this one as a Parquet file for later.

Write

Moving on from using the CLI for querying, let’s talk about the write functionality. Now, this feature set isn’t as fleshed out as I would like it to be but it covers the basics. We can drop into the write REPL and write data to InfluxDB using line protocol like so:

(influxql >) exit
(>) write
(write >) caught,id=0115,num=1,trainer=brock attack=125i,caught="KangaskhanMega Kangaskhan",defense=100i,hp=105i,level=13i,speed=100i,type1="Normal" 1688741473083000000

Next let’s have a look at the write_file feature. For this we need to drop out of the REPL entirely and use flag commands when calling ‘influx3’. Let’s load our count results into a new table:

(write >) exit
(>) exit

Exiting …

influx3 write_file --help
usage: influx3 write_file [-h] --file FILE [--measurement MEASUREMENT] --time TIME [--tags TAGS]

options:
  -h, --help            show this help message and exit
  --file FILE           the file to import
  --measurement MEASUREMENT
                        Define the name of the measurement
  --time TIME           Define the name of the time column within the file
  --tags TAGS           (optional) array of column names which are tags. Format should be: tag1,tag2

influx3 write_file --file ~/Desktop/count.parquet --time time --tags trainer --measurement summary

Here is the result:

(influxql >) SELECT count, trainer, time  FROM summary
|    | iox::measurement   | time                |   count | trainer   |
|---:|:-------------------|:--------------------|--------:|:----------|
|  0 | summary            | 2023-07-05 00:00:00 |     nan | ash       |
|  1 | summary            | 2023-07-05 00:00:00 |     nan | brock     |
|  2 | summary            | 2023-07-05 00:00:00 |     nan | gary      |
|  3 | summary            | 2023-07-05 00:00:00 |     nan | james     |
|  4 | summary            | 2023-07-05 00:00:00 |     nan | jessie    |
|  5 | summary            | 2023-07-05 00:00:00 |     nan | misty     |
|  6 | summary            | 2023-07-06 00:00:00 |     625 | ash       |
|  7 | summary            | 2023-07-06 00:00:00 |     673 | brock     |
|  8 | summary            | 2023-07-06 00:00:00 |     645 | gary      |
|  9 | summary            | 2023-07-06 00:00:00 |     664 | james     |
| 10 | summary            | 2023-07-06 00:00:00 |     663 | jessie    |
| 11 | summary            | 2023-07-06 00:00:00 |     693 | misty     |
| 12 | summary            | 2023-07-07 00:00:00 |     148 | ash       |
| 13 | summary            | 2023-07-07 00:00:00 |     180 | brock     |
| 14 | summary            | 2023-07-07 00:00:00 |     155 | gary      |
| 15 | summary            | 2023-07-07 00:00:00 |     157 | james     |
| 16 | summary            | 2023-07-07 00:00:00 |     144 | jessie    |
| 17 | summary            | 2023-07-07 00:00:00 |     178 | misty     |

Experimental feature (ChatGPT)

So with chatGPT and OpenAI being all the rage these days, I looked to see if their Python package could benefit the CLI. Interestingly it does… Because InfluxDB has been open source since its inception, chatGPT has become pretty well-versed in building InfluxQL queries. Take a look at this example:

(chatgpt >) give me a list of the top 10 caught with an attack higher than 100 from caught
Run InfluxQL query: SELECT * FROM caught WHERE attack > 100 LIMIT 10
|    | iox::measurement   | time                       |   attack | caught                    |   defense |   hp |   id |   level |   num |   speed | trainer   | type1    | type2   |
|---:|:-------------------|:---------------------------|---------:|:--------------------------|----------:|-----:|-----:|--------:|------:|--------:|:----------|:---------|:--------|
|  0 | caught             | 2023-07-06 13:09:36.095000 |      110 | Dodrio                    |        70 |   60 | 0085 |      19 |     1 |     100 | jessie    | Normal   | Flying  |
|  1 | caught             | 2023-07-06 13:09:36.095000 |      125 | Pinsir                    |       100 |   65 | 0127 |       6 |     1 |      85 | brock     | Bug      |         |
|  2 | caught             | 2023-07-06 13:10:53.995000 |      130 | CharizardMega Charizard X |       111 |   78 | 0006 |       6 |     1 |     100 | brock     | Fire     | Dragon  |
|  3 | caught             | 2023-07-06 13:10:53.995000 |      150 | BeedrillMega Beedrill     |        40 |   65 | 0015 |      12 |     1 |     145 | jessie    | Bug      | Poison  |
|  4 | caught             | 2023-07-06 13:10:53.995000 |      102 | Nidoking                  |        77 |   81 | 0034 |      20 |     1 |      85 | gary      | Poison   | Ground  |
|  5 | caught             | 2023-07-06 13:10:53.995000 |      105 | Primeape                  |        60 |   65 | 0057 |      16 |     1 |      95 | misty     | Fighting |         |
|  6 | caught             | 2023-07-06 13:10:53.995000 |      120 | Golem                     |       130 |   80 | 0076 |       8 |     1 |      45 | ash       | Rock     | Ground  |
|  7 | caught             | 2023-07-06 13:10:53.995000 |      105 | Muk                       |        75 |  105 | 0089 |       5 |     1 |      50 | brock     | Poison   |         |
|  8 | caught             | 2023-07-06 13:10:53.995000 |      105 | Muk                       |        75 |  105 | 0089 |      19 |     1 |      50 | james     | Poison   |         |
|  9 | caught             | 2023-07-06 13:10:53.995000 |      105 | Muk                       |        75 |  105 | 0089 |      16 |     2 |      50 | james     | Poison   |         |

This feature currently only uses ChatGPT 3.5 and requires an OpenAPI token. If you would like instructions on how to use this feature, check out this part of the README.

Future hopes

The future is bright for the Python CLI as our development team pushes forward with tooling for InfluxDB 3.0. For now, the scope is to keep it as a bolt-on tool for Python developers and those who want an easily extendable CLI. Here is my current laundry list for the project:

Feature Status
Improve OpenAI functionality:
  • Upgrade to chatgpt 4
  • Add call functions
  • Extend to SQL
TO DO
Find a better way to package and distribute. Currently looking into Pyinstaller as an option. TO DO
Extended write functionality. TO DO
Provide post query exploration support (Pandas functions) TO DO
Integrate delta sharing TO DO

Wrapping up

So there you have it, Part 2 done and dusted. I really enjoyed writing this blog series on both the Python Client Library and CLI. Having such a heavy hand in the inspection makes writing about them far more exciting and easy. I hope these blogs inspire you to join our new community-based libraries and tooling. If you want to chat about how to get involved, you can reach me via Slack or Discourse.