Pandas Time Series: A Primer

Navigate to:

Time series data is a fundamental part of numerous real-world applications, from stock market analysis to weather forecasting to financial market forecasting. Effectively managing, analyzing, and visualizing time series data is essential for extracting meaningful insights and making informed decisions. This is where pandas time series comes into play. It can help you organize, transform, and visualize data and examine details for a specific time period. In this post, we’ll see what pandas time series is, how it works, and the basic functions it provides using a real-world dataset.

What is pandas time series?

To understand pandas time series, it’s important to know what time series data is. It’s essentially defined as a set of data points recorded at specific time intervals over a period of time. This can be anything from the blood sugar levels for a patient collected over a few months, the stock price over the last few months, and the sales of a store over the last few weeks to temperature readings and social media activity over time.

Pandas offers extensive features and capabilities to work with time series data regardless of the domain. It builds on Python’s native datetime and dateutilmodule, NumPy’s timedelta64 and datetime64 dtypes, and combines numerous features from other libraries in Python, like scikits.timeseries, to provide new functionality to analyze and manipulate time series data.

Combining the ease of use of dateutil and datetime modules and the vectorized interface and efficient storage of NumPy’s datetime64, pandas provides a Timestamp object. The library then makes a DatetimeIndex from these Timestamp objects to index a DataFrame or Series. This makes it easy to visualize, manipulate, and extract valuable information from time-stamped data and perform operations like resampling, grouping, and filtering.

How pandas time series works

Before we delve into an example, it’s important to understand how time series works. Pandas can capture the following time-related concepts:

  • date times (a particular time and date with timezone support)
  • time spans (a point in time and its related frequency)
  • time deltas (absolute time duration)
  • date offsets (time duration that can be added to or subtracted from dates)

And it uses the following data structures to work with time series data:

  • the Timestamp type for timestamps with DatetimeIndex as the associated index structure
  • the Period type for time periods with PeriodIndex as the associated index structure (encodes fixed-frequency intervals based on datetime64 from NumPy)
  • the Timedelta type for time durations with TimedeltaIndex as the associated index structure

Both the DatetimeIndex and Timestamp objects are the most fundamental of them all. And while you can directly invoke these class objects, it’s common practice to use pd.to_datetime() to parse various formats.

Note that it’s conventional to set a time column for the DataFrame or Series to work with time series data. Setting the index allows pandas to recognize your data as time series data. Once you do that, you can use time-based indexing and other functions to analyze your data. However, the two can also directly support the time component as data. A close-up of a sign
Description automatically generated

When to use it

You can use pandas time series for all kinds of data that follow a time-based structure, like sales records and stock prices. Some common use series include:

  • identifying patterns and trends by analyzing financial market data
  • using historical time series data to forecast future values
  • monitoring sensor data over time, like humidity and temperature
  • tracking user activity on an application or website to identify usage patterns
  • studying economic indicators, GDP growth, and inflation rates

Pandas time series example

To illustrate how pandas time series works, we’ll work with a real-world example. For the following examples, I’m using Bitcoin’s historical data stored in an annotated .csv file. This file contains the Bitcoin pricing data for the last 30 days from the CoinDesk API. You can find this .csv file on GitHub.

How to create a time series

To create a time series, you first need to install Python and load the data from the .csv file.

import pandas as pd

#load in data

github_csv_url = "https://raw.githubusercontent.com/influxdata/influxdb2-sample-data/master/bitcoin-price-data/bitcoin-historical-annotated.csv"

df = pd.read_csv(github_csv_url, header = 3)

In this case, I’m setting the header to 3 because I want the third row in the .csv file as the column names. Then, you need to convert the date strings into Timestamp objects using the pd.to_datetime() function, passing the date/time column as the argument. In this case, I want to convert the time column to datetime.

# Convert to datetime

df['_time'] = pd.to_datetime(df['_time'])

You’ve now converted the date in string format into a Datetime object. And to be sure that your data is in a datetime format, you can run a basic pandas datetime method on your data to see if it works. Here’s a simple one you can try:

# To check if datetime functions work

df.loc[0, '_time'].day_name()

If you don’t see an error and instead see the day of the week, you can be sure that you’ve created a time series.

How to index a time series

One way you can filter your data by date is to first create a filter (either as a separate variable or inline) and then pass the filter to df.loc(), like this:

# Access data at a specific timestamp using a filter

filt = (df['_time']>='2023-09')

df.loc[filt]

However, one nice feature about dates is that if you set the index for your DataFrame so that it uses the date, you can filter data by slicing instead. Here’s how you can set the index so that it uses the date column, which, in this case, is the _time column:

df.set_index('_time')

If the resulting data looks good, you can then make the change permanent by setting inplace to true.

df.set_index('_time', inplace=True)

How to slice a time series

With the index set, you can now slice the data like this:

#slicing

df['2023-07':'2023-08']


How to use time series for data analysis

Now that you have a basic understanding of how you can work with pandas time series, let’s go over a few basic operations for data analysis.

Calculating basic statistics

You can compute basic statistics for your time series, such as the minimum and maximum. In this case, I’m finding the maximum and minimum value of Bitcoin for August 21, 2023.

max = df['2023-08-21']['_value'].min()

min = df['2023-08-21']['_value'].max()

print('max:', max, ',', 'min:', min)

You can also calculate the mean. In this case, I’m calculating the mean value of Bitcoin in July and August 2023:

mean = df['2023-07':'2023-08']['_value'].mean()

print('Mean value:', mean)

Or you can calculate the standard deviation in the value:

std_dev = df['_value'].std()

print(std_dev)


Resampling

You can also resample the data to a different time frequency, for example, monthly, and then perform further operations, like calculating the maximum daily value of Bitcoin. Here’s how:

std_dev = df['_value'].std()

print(std_dev)

Or you can calculate the weekly mean value:

#resampling to find the weekly mean

df.resample('W').mean()


Visualizing data

You can also create a graph to visualize the data. For instance, you can simply plot the maximum value of Bitcoin against the date, like this:

import matplotlib.pyplot as plt

# Calculate the daily high

daily_high = df['_value'].resample('D').max()

# Plotting the results

plt.figure(figsize=(12, 6))

# Plot the daily high

plt.subplot(2, 1, 1)

plt.plot(daily_high.index, daily_high.values, marker='o', linestyle='-')

plt.title('Daily Max Price')

plt.xlabel('Date')

plt.ylabel('Value')

plt.tight_layout()

plt.show()

You can find all the code for this primer in this notebook.

FAQ

Does pandas support time series?

Yes, pandas provides robust support for time series data, allowing you to efficiently handle, manipulate, and analyze time-oriented data.

How to check time series in pandas

The easiest way to check a time series in pandas is by inspecting the index. If the index has date-time values, then it’s a time series. One way to do that is with the isinstance() function, like this:

# Check time series in pandas

is_time_series = isinstance(df.index, pd.DatetimeIndex)

print(is_time_series)

Alternatively, you can use the index’s .dtype attribute to confirm if it’s type datetime64. In this case, our index is _time, and should have the .dtype as datetime64.

df.index.dtype



Summary

Pandas time series provides a great way of handling time-based data. In this primer, we’ve covered what it is, how it works, and when to use it. We’ve also covered the basics of creating, indexing, slicing, and analyzing time series using the Bitcoin pricing data for the last 30 days. You can now analyze and explore different time series datasets and unlock valuable insights for your projects across a range of domains.

This post was written by Nimra Ahmed. Nimra is a software engineering graduate with a strong interest in Node.js & machine learning. When she’s not working, you’ll find her playing around with no-code tools, swimming, or exploring something new.