Matplotlib Tutorial - Learn How to Visualize Time Series Data With Matplotlib and InfluxDB
Use Cases, Product, Developer
Jun 22, 2022
This article was written by Armstrong Asenavi. Scroll down for author bio.
A time series is a sequence of data points (observations) arranged chronologically and spaced equally in time. Some notable examples of time series data are stock prices, a record of annual rainfall, or the number of customers using a bike sharing app daily. Time series data exhibits certain patterns, such as the highs and lows of hotel prices depending on season.
What is InfluxDB?
InfluxDB is an open source, non-relational TSDB optimized for high availability and fast read-and-write functionality for time series data written in the Golang programming language. The faster retrieval and storage makes it popular for many applications, such as monitoring applications in production, storing data from IoT sensors, and performing real-time analytics. InfluxDB provides a wide range of access options, including SQL-like queries.
InfluxDB is ideal for storing data that needs quick processing once it reaches the database. Thus, it leverages a Network Time Protocol (NTP) to synchronize time across systems. It also uses a simple indexing approach, giving a clear speed advantage as compared to other TSDBs.
InfluxDB is useful for all kinds of applications that involve large amounts of time-stamped data. Here are some examples of how enterprises use InfluxDB for a variety of purposes:
Monitoring signals in production lines. Texas Instruments (TI) uses InfluxDB to monitor manufacturing operations and detect anomalies before they become costly. TI monitors over one thousand different signals of potential issues that could become troublesome. The aim is to build apps that could identify critical issues and act autonomously.
Enhancing connection efficiency in Industrial Internet of Things (IIoT). ThingWorx Kepware uses InfluxDB to help companies connect their legacy assets to databases (on-prem and cloud). Companies can collect and store tag data easily and at scale, allowing managers to monitor events across multiple devices in real time.
Real-time monitoring of user statistics. Index Exchange uses InfluxDB to monitor user statistics. An InfluxDB cluster trigger allows for real time storing of data, which analysts can then access and process in real-time inexpensively.
Visualizing time series data with Matplotlib and InfluxDB
Now let’s take a closer look at how to use Matplotlib and InfluxDB to visualize time series data. You’ll learn about the basics of Matplotlib and review some example charts, and then walk through a tutorial that will show you how to set up and use an InfluxDB Python client to perform CRUD operations.
Introduction to Matplotlib
Matplotlib is an open source library for data visualization as charts, plots, and graphs. It’s arguably the most popular plotting library for Python and is used by data scientists and machine learning engineers all around the world.
In Matplotlib, plots are hierarchical, nesting Python objects to create tree-like structures. A figure object encapsulates each plot, as pictured here:
This “figure” is the top-level container of the visualization. It can have multiple axes, which are basically individual plots inside the container.
Matplotlib uses an API known as pyplot to make it easier for users to create visualizations — you don’t have to explicitly configure the figure and axes yourself.
You can also find Python objects that control axes, tick marks, legends, titles, text boxes, the grid, and many others, all of which can be customized.
The anatomy of a Matplotlib plot looks like this:
Matplotlib can be installed using the
conda package manager, depending on your preference:
pip install matplotlib
conda install matplotlib
Then install the Numpy module as follows:
pip install numpy
It’s also handy to use the alias
plt to reference the imported sub-module:
import matplotlib.pyplot as plt import numpy as np # for creating arrays import pandas as pd # for manipulating dataframes
With Matplotlib, you can create all kinds of visualizations, such as bar plots, pie charts, radar plots, histograms, and scatter plots. Here are a few examples showing how to create some basic chart types:
plt.plot([1, 2, 3], label='Label 1') plt.plot([2, 4, 3], label='Label 2') # Add title plt.title('Two-way line plot') plt.legend() plt.show()
# Create a Figure with dpi set to 100 plt.figure(dpi=100) # Fixing random state for reproducibility np.random. seed(100) n = 20 x = np.random.rand(n) y = np.random.rand(n) colors = np.random.rand(n) area = (30 * np.random.rand(n))**2 plt.scatter(x, y, s=area, c=colors, alpha=0.5) # Add title plt.title('Scatter plot') plt.show()
labels = ['A', 'B', 'C', 'D'] x = np.arange(len(labels)) width = 0.4 plt.bar(x - width / 2, [20, 25, 40, 10], width=width) plt.bar(x + width / 2, [30, 15, 30, 20], width=width) # Ticks and tick labels must be set manually plt.xticks(x) ax = plt.gca() ax.set_xticklabels(labels) # Add axes and titles ax.set_xlabel("x-axis") ax.set_ylabel("y-axis") plt.title('Bar chart') # Show plot plt.show()
# Create figure plt.figure(figsize=(8, 8), dpi=100) plt.pie([19.5, 12.2, 4.9, 3.7], explode=(0.1, 0, 0, 0), \ labels=['USA', 'China', 'Japan', 'Germany'], autopct='%.0f%%') # Add title plt.title('Top 4 Global Economies by GDP') # Show plot plt.show()
Setting up InfluxDB
Now let’s dive into the details of the tutorial, which will focus on the cloud version of InfluxDB. It’s very easy to get started, thanks to various modules for collecting, evaluating, and visualizing data.
To begin, sign up as a free user on the InfluxDB Cloud website. Be sure to verify your account by clicking on the link sent to your email. Then log in and select a cloud provider (AWS, Azure, or GCP). Note that the free version offers limited reads and writes, up to 10,000 data sets, and a retention policy of thirty days.
After signing up, you should find yourself on the dashboard page:
Next, navigate to the Buckets section and create a bucket:
Navigate to the API Token section and create an All Access API Token. This is important because you will write data to your bucket; an All Access Token will allow you to read and write data into a bucket without limitation.
Then you’ll want to create the required folders, set up a Python virtual environment, and install required packages:
# Create a directory mkdir influxDB-Tutorial # Navigate into the folder cd influxDB-Tutorial
Next, create a virtual environment using the
venv library. Note that
venv is recommended when running Python v3.6+. See this documentation for more information.
# To create a virtual environment pip install venv # Activate virtual environment source venv/Scripts/activate # Install influxDB client pip install influxdb-client # Create a file known as named __init__.py touch __init__.py # Create a file known a file .env to store credentials touch .env # Create .gitignore file to prevent .env going public touch .gitignore # Install python-dotenv to allow access to the .env file pip install python-dotenv # Create folders for storing data and img mkdir data mkdir img # Install yfinance to Collect data pip install yfinance # Install matplotlib pip install matplotlib
Store your credentials (token, bucket, and organization) in the
.env file. Remember to include the
.env file in the
INFLUX_TOKEN = " All access token from InfluxDB" ORG = "email you registered with on InfluxDB" BUCKET = "name of your bucket"
__init__.py file, start importing the dotenv and os modules for reading credentials from the
Then go back to the InfluxDB Cloud UI, to the Sources section. In the client libraries, select “Python.” Copy the code under Initialize the Client and paste it into the
# Import modules from dotenv import load_dotenv import os from influxdb_client import InfluxDBClient, Point, WriteOptions from influxdb_client.client.write_api import SYNCHRONOUS load_dotenv() token = os.getenv('INFLUX_TOKEN') org = os.getenv('ORG') bucket = os.getenv('BUCKET') client = InfluxDBClient(url="https://us-east-1-1.aws.cloud2.influxdata.com", token=token, org=org)
Next, run your Python file:
$ python __init__.py
If there are no error messages, you have successfully connected to InfluxDB Cloud.
Using Matplotlib with InfluxDB
This tutorial uses adjusted closing prices from Yahoo Finance for four tech companies: Apple (AAPL), Amazon (AMZN), Google (GOOG), and Microsoft (MSFT), from January 1, 2020 to March 30, 2022. The idea is to visualize the price trend of the stocks over the COVID-19 period.
Downloading and saving the data
First, use the yfinance library to download the data:
import yfinance as yf tickers = ['AAPL', 'GOOG', 'AMZN', 'MSFT'] #Download only adjusted closing price for the tickers data = yf.download(tickers, start="2020-01-01", end="2022-03-30")['Adj Close'] data.head()
The imported data will appear as follows:
Then use this code to save the data as a CSV file:
# Save the data into the data folder as a csv file. data.to_csv('data/stocks.csv')
Preparing to write the data
Next, use the following code to load the required libraries and link to the InfluxDB Python client:
# Import required packages and access influxdb client from collections import OrderedDict from csv import DictReader import rx from rx import operators as ops from dotenv import load_dotenv import os from influxdb_client import InfluxDBClient, Point, WriteOptions from influxdb_client.client.write_api import SYNCHRONOUS load_dotenv() token = os.getenv('INFLUX_TOKEN') org = os.getenv('ORG') bucket = os.getenv('BUCKET') client = InfluxDBClient(url="https://us-east-1-1.aws.cloud2.influxdata.com", token=token, org=org)
Creating a function to write the data
Then use the code below to prepare the CSV data in point structure and write it into an InfluxDB bucket named finance-bucket:
def parse_row(row: OrderedDict): """ This function parses rows into Point with structure: the csv file has the following columns: Date,AAPL,AMZN,GOOG,MSFT ... :param row: row of csv file :return: Parsed csv row to Point """ return Point("financial-analysis")\ .tag("type", "stock-daily") \ .field("aapl", float(row['AAPL'])) \ .field("amzn", float(row['AMZN'])) \ .field("goog", float(row['GOOG'])) \ .field("msft", float(row['MSFT'])) \ .time(row['Date']) """ Convert the stocks.csv into data points """ data = rx \ .from_iterable(DictReader(open('./data/stocks.csv', 'r'))) \ .pipe(ops.map(lambda row: parse_row(row))) """ Create a client that writes data. """ write_api = client.write_api(write_options=WriteOptions(batch_size=5_000, flush_interval=1_000)) """ Write data into InfluxDB """ write_api.write(bucket=bucket, record=data) write_api.close()
Using Flux to query the data from InfluxDB Cloud
Once the data is in the bucket, you can query it using Flux language and return a dataframe (df) using the following code:
query_with_multiple_fields =''' from(bucket: "finance-bucket") |> range(start:0, stop: now()) |> filter(fn: (r) => r._measurement == "financial-analysis") ''' # Output the data as data frame named df df = client.query_api().query_data_frame(org=org, query=query_with_multiple_fields) df.head()
As you can see, the data in the dataframe is presented in a flat structure, with each field in a row:
Writing a Flux query to pivot the time series into a dataframe
Next, use Flux to query and pivot the data into a dataframe with all fields presented as columns:
query_with_pivot =''' from(bucket: "finance-bucket") |> range(start:0, stop: now()) |> filter(fn: (r) => r._measurement == "financial-analysis") |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value") |> keep(columns:["_time", "aapl", "amzn", "goog", "msft"]) ''' # put the data in a dataframe df = client.query_api().query_data_frame(org=org, query=query_with_pivot) df.head()
The output shows that the data is now in the intended dataframe. However, as you can see in the screenshot below, InfluxDB added ‘result’ and ‘table’ columns, which you’ll drop in the next section.
Dropping irrelevant columns
drop method to remove the ‘result’ and ‘table’ columns that are not relevant to this analysis:
df["_time"] = pd.to_datetime(df["_time"].astype(str)) df = df.drop(columns=["result", "table"]) df = df.set_index("_time") df.head()
The final dataframe is presented as follows:
Normalizing the dataframe to create professional charts
The dataframe shows that AMZN and GOOG stock prices are in the thousands, while AAPL and MSFT have their stock prices in the tens and hundreds. Plotting the data as-is will have two charts at the top (AMZN and GOOG) and two charts at the bottom (MSFT and AAPL), which will look unprofessional. To solve this problem, you can normalize the stock data regarding day 1, forcing the first day price to start at $1.
Use the following code to do so:
def normalize_df(df): """normalizes stock data w.r.t price in day 1, force first day price to start at $1""" return df/df.iloc[0,:] data = normalize_df(df)
Visualizing the data with Matplotlib
Finally, use this code to create your time series plot of the stock prices:
# Create the plot ax = data.plot(figsize=(15, 10)) ax.set_xlabel("Date") ax.set_ylabel("Adjusted Price") plt.title('Adjusted Closing Prices') plt.legend(loc='upper left', fontsize=12) plt.tight_layout() plt.style.use('bmh') plt.grid(True) plt.show()
After running the code, the output will look like this:
Analyzing the data
As you can see in the final plot, there was a temporary decline in prices between March and April 2020, when governments announced lockdown measures due to COVID-19. Since then, technology stocks have performed relatively well despite the effects of COVID-19. There are a number of options available for more in-depth time series data analysis, as well as how to make projections via time series forecasting methodologies.
In this tutorial, you learned the basics of time series visualization in Matplotlib, a popular plotting library for Python. You also learned how to set up the InfluxDB Cloud, write data to a bucket, and read the data into Python for analysis, using simple Python scripts to perform read-and-write operations in InfluxDB.
As you’ve seen, there’s a lot of power and flexibility in combining the capabilities of InfluxDB and Matplotlib for creating efficient visualization of time series data. You can access the full source code for this tutorial in this GitHub repo.
If you are interested in some other tutorials showing how to visualize data with InfluxDB, check out the links below:
Recharts Tutorial - Learn how to visualize IoT data using the Recharts charting library with InfluxDB
React Native Victory Tutorial - This tutorial shows how to use the React Native version of the Victory charting library, which allows you to use your web development experience to create native mobile applications.
Nivo Tutorial- In this article you learn about the fundamentals of the Nivo charting library and how to create various types of charts while pulling data from InfluxDB.
Plotly.js Tutorial- Use the PlotlyJS charting library to display data stored in InfluxDB.
About the author: