The best way to store, collect and analyze time series data
In this technical paper, InfluxData CTO, Paul Dix, will walk you through what time series is (and isn’t), what makes it different from stream processing, full-text search and other solutions.
By reading this tech paper, you will:
- Learn how time series data is all around us,
- See why a purpose built TSDB is important.
- Read about how a Time Series database is optimized for time-stamped data.
- Understand the differences between metrics, events, & traces and some of the key characteristics of time series data..
- Understand the differences between metrics, events, & traces.
Download this technical paper
Storing time series data
Time series data is best stored in a time series database (TSDB) built specifically for handling metrics and events that are time-stamped. This is because time series data is often ingested in massive volumes that require a purpose-built database designed to handle that scale. Time series data also requires unique data engineering solutions to efficiently execute common time series tasks like for data lifecycle management, data summarization, and queries over large time ranges. Time series databases are designed to handle those tasks — while accounting for changes over time.
With a time series database, it is common to:
- Request a summary of data over a large time period — TSDB’s are optimized for exactly this use case giving millisecond level query times over months of data.
- Write high volumes of data
- Automatically downsample or expire old time series that are no longer useful or keep high-precision data around for a short period of time — this kind of data lifecycle management is difficult for application developers to implement on top of regular databases. With a time series database, this functionality is provided out of the box.
InfluxDB is the most popular open source time series database (TSDB) because it:
- Is purpose-built with several design insights to be optimized for time series data and to handle increased write and query performance.
- Has a low barrier to adoption, or Time to Awesome ™. InfluxData prioritizes developer happiness by providing a guided onboarding experience, multi-cloud support, a visual coding data analysis solution or query builder, powerful client libraries for several languages, and around the clock support to open source users through community channels.
- Enables high cardinality use cases by storing the index on disk with TSI.
- Is consistent, available, and partition-tolerant — an exception in the NoSQL world because it is a hybrid non-strict CA/AP database.
- Written in Go for fast and concurrent data fetching with an easy binary deploy.
Purpose-built time series database
InfluxDB was built from the ground up to be a purpose-built time series database, to handle high write and query loads. Time was built-in from the beginning. InfluxDB is part of a comprehensive platform that supports the collection, storage, monitoring, visualization and alerting of time series data. It’s much more than just a time series database:
- InfluxDB 1.x is the open source time series database component of the TICK Stack (Telegraf, InfluxDB, Chronograf, Kapacitor).
- InfluxDB 2.0 incorporates everything you need in a time series platform into a single binary.
InfluxDB OSS (the open source edition of InfluxDB) runs on a single node.
InfluxDB is also available in:
- A Cloud edition — an elastic, serverless managed database as a service (InfluxDB Cloud)
- An Enterprise edition — a subscription which turns any InfluxData instance into a production-ready cluster that can run anywhere (InfluxDB Enterprise) and that provides high availability to eliminate a single point of failure.
Time series databases are proven to outperform relational databases for time series data to make your data work faster for you. With no dependencies, the lower operating costs of a purpose-built time series database like InfluxDB means higher productivity. It can easily handle large sets of time-stamped data, can be used for real-time monitoring, and also makes it easy to manage the lifecycle of your data. To identify relationships and correlations in data from time series databases, you can use data warehousing.
The best time series database
InfluxDB is open source, performant and scalable:
- The InfluxDB data model is quite different from other time series solutions like Graphite, RRD, or OpenTSDB. InfluxDB 1.8 has a Line Protocol for sending time series data (which makes lookups for specific series very fast by informing InfluxDB of the data’s measurement, tag set, field set, and timestamp). InfluxDB 1.7+ comes built-in with Flux, a powerful data scripting and query language that helps developers see change across time.
- Timestamps in InfluxDB can be second, millisecond, microsecond, or nanosecond precision. The micro and nanosecond scales make InfluxDB a good choice for use cases in finance and scientific computing where other solutions would be excluded.
- Compression is variable depending on the level of precision the user needs. On disk, the data is organized in a columnar style format where contiguous blocks of time are set for the measurement, tagset, field. So, each field is organized sequentially on disk for blocks of time, which make calculating aggregates on a single field a very fast operation.
- There is no limit to the number of tags and fields that can be used. Other time series solutions don’t support multiple fields, and have limitations on the number of tags that can be used.
InfluxDB accelerates time to awesome for developers, supports many data types, and allows the user unlimited fields and tags — enabling you to scale effortlessly as you grow. Because of all these factors, InfluxDB is the best solution for working with time series data.
Overview of InfluxDB features
InfluxDB features that make it a great choice for working with time series data include:
- Custom high-performance datastore written specifically for time series data. The TSM engine allows for high ingest speed and data compression.
- Compiles into a single binary with no external dependencies (InfluxDB is written entirely in Go).
- Simple, high-performing write and query HTTP APIs.
- Plugin support for other data ingestion protocols such as Graphite, collectd, and OpenTSDB.
- Expressive SQL-like query language tailored to easily query aggregated data (InfluxDB 1.8), and Flux, a powerful new language that enables you to query and code together (InfluxDB 2.0).
- Tags allow series to be indexed for fast and efficient queries.
- Retention policies efficiently auto-expire stale data.
- Continuous Queries and Tasks automatically compute aggregate data to make frequent queries more efficient.
InfluxDB’s open source metrics collection agent
One main advantage of using InfluxDB to store time series data is that it has a native metrics collection agent, Telegraf. An open source plugin-driven server agent, Telegraf can be used to collect and send metrics and events from databases, systems, and IoT sensors.
Telegraf is written in Go and compiles into a single binary with no external dependencies, and requires a very minimal memory footprint. Its plugin system allows new inputs and outputs to be easily added, with many integrations to a variety of metrics, events, and logs from popular containers and systems. Telegraf makes integration seamless and easy. Discover all of Telegraf’s 250+ integrations:
- Telegraf input plugins collect metrics from the system, services, or third party APIs.
- Telegraf output plugins write metrics to various destinations.
- Telegraf aggregator pluginscreate aggregate metrics (such as mean, min, max, quantiles).
- Telegraf processor plugins transform, decorate, and filter metrics.
InfluxDB for time series analysis
Flux is InfluxData’s new functional, standalone data scripting language designed for querying, analyzing, and acting on time series data. It takes the power of InfluxQL and the functionality of TICKscript and combines them into a single, unified syntax. The Flux standard library includes built-in functions and importable packages that retrieve, transform, process, and output data.
The following example illustrates pulling data from a
bucket (similar to an InfluxQL database) for the last five minutes, filtering that data by the
cpu measurement and the
cpu=cpu-total tag, windowing the data in 1 minute intervals, and calculating the average of each window:
from(bucket:"telegraf/autogen") |> range(start:-1h) |> filter(fn:(r) => r._measurement == "cpu" and r.cpu == "cpu-total" ) |> aggregateWindow(every: 1m, fn: mean)
Flux is packaged with InfluxDB v1.7+ and does not require any additional installation; however, it does need to be enabled. The best way to familiarize yourself with Flux is to walk through creating a simple Flux query. The Flux getting started documentation does just that. Learn more about time series analysis and time series forecasting using InfluxDB, and download InfluxDB with Flux.
InfluxDB use cases and solutions
InfluxDB is widely used by developers and enterprises around the world as a backing store for any use case involving large amounts of time-stamped data, including DevOps monitoring, application metrics, IoT sensor data, real-time analytics and machine learning.
Solutions built using InfluxDB include
- Application Performance Monitoring (APM)
- Google Cloud Monitoring
- Industrial IoT
- Kubernetes Monitoring
- Network Performance Monitoring
- Stream Processing
You can find numerous time series case study examples to learn how companies in various industries have put InfluxDB to work for their use case.
Getting started with the serverless database as a service
Because it is a SaaS solution, InfluxDB Cloud is the easiest way to get started. Built from an open source core, InfluxDB Cloud is a serverless elastic scale database-as-a-service that is maintained and administered by the experts at InfluxData. InfluxDB Cloud was architected to run on any cloud – Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure.