Obtaining and Storing Time Series Data with Python

Navigate to:

This article was originally published in The New Stack and is reposted here with permission.

In this tutorial we’ll learn how to use Python to get time series data from the OpenWeatherMap API and convert it to a Pandas DataFrame. Next we’ll write that data to InfluxDB, a time-series data platform, with the InfluxDB Python Client. We’ll convert the JSON response from our API call to a Pandas DataFrame because I find that that’s the easiest way to write data to InfluxDB. We’re writing to InfluxDB because it’s a purpose-built database that’s meant to handle the high ingest requirements and of time-series data.

Requirements

This tutorial was executed on a macOS system with Python 3 installed via Homebrew. I recommend setting up additional tooling like virtualenv, pyenv, or conda-env to simplify Python and Client installations. Otherwise, the full requirements are here:

influxdb-client=1.30.0
pandas=1.4.3
requests>=2.27.1

This tutorial also assumes that you have either created a Free Tier InfluxDB Cloud account or are using InfluxDB OSS. It also assumes that you have:

Finally, this tutorial requires that you have created an account with OpenWeatherMap and have [created a token (https://home.openweathermap.org/api_keys).

Request weather data

First, we need to request our data. We’ll use the request library to return hourly weather data from a specified longitude and latitude with the OpenWeatherMap API.

# Get time series data from OpenWeatherMap API
params = {'lat':openWeatherMap_lat, 'lon':openWeatherMap_lon, 'exclude': "minutely,daily", 'appid':openWeatherMap_token}
r = requests.get(openWeather_url, params = params).json()
hourly = r['hourly']

Convert the Data to a Pandas DataFrame

Next, convert the JSON data to a Pandas DataFrame. We’ll also convert the timestamp from a Unix timestamp with second precision to a datetime object. This conversion is being made because the InfluxDB write method requires that the timestamp is in datetime object format. We’ll use this method next to write our data to InfluxDB. We also drop columns that we don’t want to write to InfluxDB.

# Convert data to Pandas DataFrame and convert timestamp to datetime object
df = pd.json_normalize(hourly)
df = df.drop(columns=['weather', 'pop'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
print(df.head)

Write the Pandas DataFrame to InfluxDB

Now instantiate the InfluxDB Python client library and write the DataFrame to InfluxDB. We specify a measurement name. A measurement contains data within a bucket. You can think of it as the second-highest hierarchical level of data organization in InfluxDB after a bucket.

You can also specify which columns to convert to tags with the data_frame__tag_columns parameter.

Since we haven’t specified any columns as tags, all of our columns will be converted into fields in InfluxDB. Tags are used to write metadata about your time-series data and can be used to query data subsets together more efficiently. Fields are where you store your actual time-series data within InfluxDB. This documentation covers these data concepts within InfluxDB in more detail.

# Write data to InfluxDB
with InfluxDBClient(url=url, token=token, org=org) as client:
   df = df
   client.write_api(write_options=SYNCHRONOUS).write(bucket=bucket,record=df,
       data_frame_measurement_name="weather",
       data_frame_timestamp_column="dt")

Full script

To review, let’s take a look at the full script all together. We followed the following steps:

  1. Import libraries
  2. Gather the following:
    • InfluxDB bucket
    • InfluxDB org
    • InfluxDB token
    • InfluxDB URL
    • OpenWeatherMap URL
    • OpenWeatherMap token
  3. Build your request.
  4. Convert the JSON response to a Pandas DataFrame.
  5. Drop any columns you don’t want to write to InfluxDB.
  6. Convert the timestamp column from unix time to a Pandas datetime object.
  7. Instantiate the InfluxDB Python Client Library.
  8. Write the DataFrame and specify the measurement name and timestamp column.
import requests
import influxdb_client
import pandas as pd
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS

bucket = "OpenWeather"
org = "<my_InfluxDB_org>" # or email you used to create your Free Tier InfluxDB Cloud account
token = "<my_InfluxDB_token"
url = "<my_InfluxDB_url>" # for example, https://us-west-2-1.aws.cloud2.influxdata.com/
openWeatherMap_token = "<my_OpenWeatherMap_token>"
openWeatherMap_lat = "33.44"
openWeatherMap_lon = "-94.04"
openWeather_url = "https://api.openweathermap.org/data/2.5/onecall"

# Get time series data from OpenWeatherMap API
params = {'lat':openWeatherMap_lat, 'lon':openWeatherMap_lon, 'exclude': "minutely,daily", 'appid':openWeatherMap_token}
r = requests.get(openWeather_url, params = params).json()
hourly = r['hourly']

# Convert data to Pandas DataFrame and convert timestamp to datetime object
df = pd.json_normalize(hourly)
df = df.drop(columns=['weather', 'pop'])
df['dt'] = pd.to_datetime(df['dt'], unit='s')
print(df.head)

# Write data to InfluxDB
with InfluxDBClient(url=url, token=token, org=org) as client:
   df = df
   client.write_api(write_options=SYNCHRONOUS).write(bucket=bucket,record=df,
       data_frame_measurement_name="weather",
       data_frame_timestamp_column="dt")

Query the data

Now that we’ve written our data to InfluxDB, we can use the InfluxDB UI to query for our data. Navigate to the Data Explorer (from the left-hand navigation bar). Use the Query Builder to select for the data that you want to visualize and the range that you want to visualize it for and then hit Submit.

Data Explorer-Query the Data

A default materialized view of our weather data. InfluxDB automatically aggregates the time-series data so that new users don’t accidentally query for too much data and get a timeout.

Pro Tip: InfluxDB automatically downsamples your data when you query for it with the Query Builder. To query for your raw data, navigate to the Script Editor to view the underlying Flux query. Flux is the native query and scripting language for InfluxDB, which can be used for analyzing and creating forecasts with your time-series data. Uncomment or delete the line with the aggregateWindow() function to view your raw data.

data explorer-veiw raw data

Navigate to the Script Editor and uncomment or delete the aggregateWindow() function to view the raw weather data.

Conclusion

I hope this blog post inspires you to take advantage of the InfluxDB Python Client library to obtain and store time-series data in InfluxDB. If you want to learn more about using the Python Client Library to query data from InfluxDB, I encourage you to take a look at this post. It’s also worth noting that you can use Flux to obtain data from the OpenWeatherMap API and store it with InfluxDB with a task. If you’re using InfluxDB Cloud this means that this Flux script will be hosted and executed regularly, so you can get a reliable stream of weather data into your instance. To learn more about how to obtain weather data with Flux on a user-defined schedule read this post.