What is time series data?
Time series data, also referred to as time-stamped data, is a sequence of data points indexed in time order. Time-stamped is data collected at different points in time.
These data points typically consist of successive measurements made from the same source over a time interval and are used to track change over time.
Time series data is a collection of observations obtained through repeated measurements over time. Plot the points on a graph, and one of your axes would always be time.
Time series data is everywhere, since time is a constituent of everything that is observable. As our world gets increasingly instrumented, sensors and systems are constantly emitting a relentless stream of time series data. Such data has numerous applications across various industries. Let’s put this in context through some examples.
Time series data can be useful for:
- Tracking daily, hourly, or weekly weather data
- Tracking changes in application performance
- Medical devices to visualize vitals in real time
- Tracking network logs
Here are some examples of time series data in greater detail.
Time series examples
Weather records, economic indicators and patient health evolution metrics — all are time series data. Time series data could also be server metrics, application performance monitoring, network data, sensor data, events, clicks and many other types of analytics data.
Notice how time — depicted at the bottom of the below chart — is the axis.
In the next chart below, note time as the axis over which stock price changes are measured. In investing, a time series tracks the movement of data points, such as a security’s price over a specified period of time with data points recorded at regular intervals. This can be tracked over the short term (such as a security’s price on the hour over the course of a business day) or the long term (such as a security’s price at close on the last day of every month over the course of five years).
The cluster monitoring example below, depicting disk ops write and usage data, would be familiar to Network Operation Center teams. Remember that monitoring data is time series data.
Another familiar example of time series data is patient health monitoring, such as in an electrocardiogram (ECG), which monitors the heart’s activity to show whether it is working normally.
In addition to being captured at regular time intervals, time series data can be captured whenever it happens — regardless of the time interval, such as in logs. Logs are a registry of events, processes, messages and communication between software applications and the operating system. Every executable file produces a log file where all activities are noted. Log data is an important contextual source to triage and resolve issues. For example, in networking, an event log helps provide information about network traffic, usage and other conditions.
Traces (a list of the subroutine calls that an application performs during execution) are also time series data. Over the colored bands in the traces chart below, you can see examples of time series data. The goal of tracing is to follow a program’s flow and data progression. Tracing encompasses a wide, continuous view of an application to find bugs in a program or application.
The examples above encompass two different types of time series data, as explained below.
Types of time series data
Time series data can be classified into two types:
- Measurements gathered at regular time intervals (metrics)
- Measurements gathered at irregular time intervals (events)
In the “Time series data examples” section above:
- Examples 3 (cluster monitoring) and 4 (health monitoring) depict metrics.
- Examples 5 (logs) and 6 (traces) depict events.
Because they happen at irregular intervals, events are unpredictable and cannot be modeled or forecasted since forecasting assumes that whatever happened in the past is a good indicator of what will happen in the future.
Linear vs. nonlinear time series data
A linear time series is one where, for each data point Xt, that data point can be viewed as a linear combination of past or future values or differences. Nonlinear time series are generated by nonlinear dynamic equations. They have features that cannot be modelled by linear processes: time-changing variance, asymmetric cycles, higher-moment structures, thresholds and breaks. Here are some important considerations when working with linear and nonlinear time series data:
- If a regression equation doesn’t follow the rules for a linear model, then it must be a nonlinear model.
- Nonlinear regression can fit an enormous variety of curves.
- The defining characteristic for both types of models are the functional forms.
Identifying time series data
Time series data is unique in that it has a natural time order: the order in which the data was observed matters. The key difference with time series data from regular data is that you’re always asking questions about it over time. An often simple way to determine if the dataset you are working with is time series or not, is to see if one of your axes is time.
Time series considerations
Immutability – Since time series data comes in time order, it is almost always is recorded in a new entry, and as such, should be immutable and append-only (appended to the existing data). It doesn’t usually change but is rather tacked on in the order that events happen. This property distinguishes time series data from relational data which is usually mutable and is stored in relational databases that do online transaction processing, where rows in databases are updated as the transactions are run and more or less randomly; taking an order for an existing customer, for instance, updates the customer table to add items purchased and also updates the inventory table to show that they are no longer available for sale.
The fact that time series data is ordered makes it unique in the data space because it often displays serial dependence. Serial dependence occurs when the value of a datapoint at one time is statistically dependent on another datapoint in another time (read “Autocorrelation in Time Series Data” for a detailed explanation about this topic).
Though there are no events that exist outside of time, there are events where time isn’t relevant. Time series data isn’t simply about things that happen in chronological order — it’s about events whose value increases when you add time as an axis. Time series data sometimes exists at high levels of granularity, as frequently as microseconds or even nanoseconds. With time series data, change over time is everything.
Different forms of time series data – Time series data is not always numeric — it can be int64, float64, bool, or string.
Time series data vs. cross-sectional and panel data
To determine whether your data is time series data, figure out what you’ll need to determine a unique record in the data set.
- If all you need is a timestamp, it’s probably time series data.
- If you need something other than a timestamp, it’s probably cross-sectional data.
- If you need a timestamp plus something else, like an ID, it’s probably panel data.
What the above means becomes clearer upon recalling the definition of (and differences between) each of these three data types:
Time series data definition
Time series data is a collection of observations (behavior) for a single subject (entity) at different time intervals (generally equally spaced as in the case of metrics, or unequally spaced as in the case of events).
For example: Max Temperature, Humidity and Wind (all three behaviors) in New York City (single entity) collected on First day of every year (multiple intervals of time)
The relevance of time as an axis makes time series data distinct from other types of data.
Cross-sectional data definition
Cross-sectional data is a collection of observations (behavior) for multiple subjects (entities such as different individuals or groups ) at a single point in time.
For example: Max Temperature, Humidity and Wind (all three behaviors) in New York City, SFO, Boston, Chicago (multiple entities) on 1/1/2015 (single instance)
In cross-sectional studies, there is no natural ordering of the observations (e.g. explaining people’s wages by reference to their respective education levels, where the individuals’ data could be entered in any order).
For example: the closing price of a group of 50 stocks at a given moment in time, an inventory of a given product in stock at a specific stores, and a list of grades obtained by a class of students on a given exam.
Panel data (longitudinal data) definition
Panel data is usually called as cross-sectional time series data as it is a combination of the above- mentioned types (i.e., collection of observations for multiple subjects at multiple instances).
Panel data or longitudinal data is multi-dimensional data involving measurements over time. Panel data contains observations of multiple phenomena obtained over multiple time periods for the same firms or individuals. A study that uses panel data is called a longitudinal study or panel study.
For example: Max Temperature, Humidity and Wind (all three behaviors) in New York City, SFO, Boston, Chicago (multiple entities) on the first day of every year (multiple intervals of time).
Differences between the three data types
Based on the above definitions and examples, let’s recap the differences between the three data types:
- A time series is a group of observations on a single entity over time — e.g. the daily closing prices over one year for a single financial security, or a single patient’s heart rate measured every minute over a one-hour procedure.
- A cross-section is a group of observations of multiple entities at a single time — e.g. today’s closing prices for each of the S&P 500 companies, or the heart rates of 100 patients at the beginning of the same procedure.
- If your data is organized in both dimensions — e.g. daily closing prices over one year for 500 companies — then you have panel data.
How is time series data understood and used?
Time series data is gathered, stored, visualized and analyzed for various purposes across various domains:
- In data mining, pattern recognition and machine learning, time series analysis is used for clustering, classification, query by content, anomaly detection and forecasting.
- In signal processing, control engineering and communication engineering, time series data is used for signal detection and estimation.
- In statistics, econometrics, quantitative finance, seismology, meteorology, and geophysics the time series analysis is used for forecasting.
Time series data can be visualized in different types of charts to facilitate insight extraction, trend analysis, and anomaly detection. Time series visualization and dashboarding tools include the InfluxDB UI and Grafana.
Time series data is used in time series analysis (historical or real-time) and time series forecasting to detect and predict patterns — essentially looking at change over time. Following is a brief overview of each.
Time series analysis methods
Time series analysis is the use of statistical methods to analyze time series data and extract meaningful statistics and characteristics about the data. Time series analysis helps identify trends, cycles, and seasonal variances to aid in the forecasting of a future event. Factors relevant to time series analysis include stationarity, seasonality and autocorrelation.
Time series analysis can be useful to see how a given variable changes over time (while time itself, in time series data, is often the independent variable). Time series analysis can also be used to examine how the changes associated with the chosen data point compare to shifts in other variables over the same time period.
Learn more about time series analysis methods, including spectral analysis, wavelet analysis, autocorrelation, and cross-correlation.
Time series forecasting methods
Time series forecasting uses information regarding historical values and associated patterns to predict future activity.
Time series forecasting methods include:
- Trend analysis
- Cyclical fluctuation analysis
- Seasonal pattern analysis
As with all forecasting methods, success is not guaranteed. Machine learning is often used for this purpose. So are its classical predecessors: Error, Trend, Seasonality Forecast (ETS), Autoregressive Integrated Moving Average (ARIMA) and Holt-Winters.
To ‘see things’ ahead of time, time series modeling (a forecasting method based on time series data) involves working on time-based data (years, days, hours, minutes) to derive hidden insights that inform decision-making. Time series models are very useful models when you have serially correlated data. Most businesses work on time series data to analyze sales projections for the next year, website traffic, competitive positioning and much more.
Learn more about time series forecasting methods, including decompositional models, smoothing-based models, and models including seasonality.
Where is time series data stored?
Time series data is often ingested in massive volumes and requires a purpose-built database designed to handle its scale. Properties that make time series data very different than other data workloads are data lifecycle management, summarization, and large range scans of many records. This is why time series data is best stored in a time series database built specifically for handling metrics and events or measurements that are time-stamped.
Learn more about time series data storage and about the best way to store, collect and analyze time series data.