Welcome to Part Three of this three-part blog post series. To understand Part Three, I suggest reading Part One and Two first.
In Part One, we covered:
- When to use Holt-Winters
- How Single Exponential Smoothing works
- A conceptual overview of optimization for Single Exponential Smoothing
- Extra: The Proof for Optimization of RSS for Linear Regression
In Part Two, we dove into:
- How Single Exponential Smoothing relates to Triple Exponential Smoothing/Holt-Winters
- How RSS relates to RMSE (root mean squared error)
- How RMSE is optimized for Holt-Winters using the Nelder-Mead method
In this current piece, Part Three, we explore:
- How you can use InfluxDB’s built-in Multiplicative Holt-Winters function to generate predictions on your time series data
- A list of learning resources
How to use InfluxDB’s built-in multiplicative Holt-Winters function to generate predictions on time series data
For the sake of Developer Experience, I’ve decided to follow the Holt-Winters example in the documentation. The dataset for this example can be downloaded with:
curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
The download consists of a collection of tide and current metrics from the National Oceanic and Atmospheric Administration. This dataset is a little outdated (from 2015). If you want to work with recent data, I encourage you to use the CO-OPS API for current data retrieval. After downloading it, you can write it to InfluxDB with:
influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database
We use this query to inspect the data:
SELECT "water_level" FROM "NOAA_water_database"."autogen"."h2o_feet" WHERE "location"='santa_monica' AND time >= '2015-08-17 22:12:00' AND time <= '2015-08-28 03:00:00'
Upon visual inspection, we can see that our data has an offset of 348m (2015/08/17 20:00- 2015/08/18 01:48). We also see strong seasonality starting at 2015/08/22, so we will use data after that point for our forecast.
SELECT "water_level" FROM "NOAA_water_database"."autogen"."h2o_feet" WHERE "location"='santa_monica' AND time >= '2015-08-22 22:12:00' AND time <= '2015-08-28 03:00:00'
We can also find the length of each time that occurs in between each peak from visual inspection.
We find that it’s approximately 379m (2015/08/25 03:18 — 2015/08/25 9:36).
Now we’re ready to match the trends of the raw data by using the FIRST() function and grouping by the time-between-peaks duration after applying an offset. The FIRST() function returns the oldest field value associated with the field key.
The goal of the next query is to represent our data with as few points as possible. If we effectively summarize our data with few points, then we can use Holt-Winters effectively and efficiently. In each season we see two hills, one larger than the other. Each hill has a peak and a valley.
We then use the FIRST() function to pick a point at each peak and valley. We accomplish this by grouping by the time span between each peak and valley so that we can make sure that we don’t miss one.
SELECT FIRST("water_level") FROM "NOAA_water_database"."autogen"."h2o_feet" WHERE "location"='santa_monica' and time >= '2015-08-22 22:12:00' and time <= '2015-08-28 03:00:00' GROUP BY time(379m,348m)
Now we can clearly see that we have four peaks and valleys for each season, and we’ve represented each one with a corresponding point.
I want to make one last comment about the FIRST() function. Data is valuable, but more is not necessarily better. Using an aggregate like MEAN() instead of FIRST() would actually obscure the shape of our data, and would make a poor and damped prediction. Okay, now we’re finally ready to use the HOLT_WINTERS() function.
If we want to predict 10 points, we would write:
SELECT HOLT_WINTERS_WITH_FIT(FIRST("water_level"),10,4) FROM "NOAA_water_database"."autogen"."h2o_feet" WHERE "location"='santa_monica' AND time >= '2015-08-22 22:12:00' AND time <= '2015-08-28 03:00:00' GROUP BY time(379m,348m)
Congratulations! You’ve made a forecast with Holt-Winters. That’s all there is to it.
P.S. Fun fact: if you set the period = 0, then you transform Holt-Winters from Triple Exponential Smoothing to Double Exponential Smoothing. So, if your data has trend but doesn’t have seasonality, fret not — you can use the HOLT_WINTERS() function for your forecasting needs as well.
A list of learning resources
I want to share all the resources that I used to learn about Holt-Winters to both supplements your understanding and credit the authors for their wonderful work. Thank you.
- Forecasting: Principles and Practices
- Results From Comparing Classical and Machine Learning MEthods for Time Series
- Holt-Winters Resources
- Influx Resources
- Minimizing squared error to regression line
I hope this tutorial helps get you started on your forecasting journey. If you have any questions, please post them on the community site or tweet us @InfluxDB.