InfluxDB Markedly Outperforms OpenTSDB in Time-Series Data & Metrics Benchmark

Todd Persen — November 18, 2016

This is the an update in a series of detailed benchmarking tests comparing InfluxDB vs other databases for time-series data and metrics workloads. Previously, we have completed benchmarking tests comparing InfluxDB vs Elasticsearch, Cassandra, and MongoDB.

At InfluxData, one of the common questions we’ve been getting asked by developers and architects alike the last few months is, “How does InfluxDB compare to OpenTSDB for time-series workloads?” This question might be prompted for a few reasons. First, if they’re starting a brand new project and doing the due diligence of evaluating a few solutions head-to-head, it can be helpful in creating their comparison grid. Second, they might already be using OpenTSDB for ingesting logs in an existing monitoring setup, but would like to now see how they can integrate metrics collection into their system and believe there might be a better solution than OpenTSDB for this task.

Over the last few weeks we set out to compare the performance and features of InfluxDB and OpenTSDB for common time-series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. InfluxDB outperformed OpenTSDB in all three tests with 5x greater write throughput per server, while using 16.5x less disk space when compared against OpenTSDB’s time series optimized configuration, and delivering 4.0x faster response times for tested queries.

It’s also important to note that OpenTSDB uses Apache HBase as its storage backend and requires a significant amount of setup, configuration, and tuning to achieve optimal write and query performance and often requires a separate team to manage the HBase and HDFS deployment. InfluxDB, on the other hand, is ready to use for time-series workloads out of the box with no additional configuration with a schema and query language designed for working with time series.

To read the complete details of the benchmarks and methodology, download the “Benchmarking InfluxDB vs. OpenTSDB for Time-Series Data & Metrics Management” technical paper or register for the webinar on Nov 22.

Our overriding goal was to create a consistent, up-to-date comparison that reflects the latest developments in both InfluxDB and OpenTSDB. We will periodically re-run these benchmarks and update our detailed technical paper with our findings. All of the code for these benchmarks are available on Github. Feel free to open up issues or pull requests on that repository or if you have any questions, comments, or suggestions.

Now, let’s take a look at the results…

Versions Tested

InfluxDB v1.1.0

InfluxDB is an open-source time-series database written in Go. At its core is a custom-built storage engine called the Time-Structured Merge (TSM) Tree, which is optimized for time-series data. Controlled by a custom SQL-like query language named InfluxQL, InfluxDB provides out-of-the-box support for mathematical and statistical functions across time ranges and is perfect for custom monitoring and metrics collection, real-time analytics, plus IoT and sensor data workloads.

OpenTSDB v2.30rc2

OpenTSDB is a scalable, distributed time-series database written in Java and built on top of HBase. It is not a standalone time-series database and relies on HBase as its data storage layer, so the OpenTSDB time-series daemons effectively provide the functionality of a query engine with no shared state between instances. Additionally, OpenTSDB is primarily designed for generating dashboard visualizations, therefore it does not always return exact data for arbitrary time ranges. Note that we set the HBase replication factor to 1 to achieve the fairest comparison.

About the Benchmarks

In building a representative benchmark suite, we identified the most commonly evaluated characteristics for working with time-series data. We looked at performance across three vectors:

  • Data ingest performance – measured in values per second per server
  • On-disk storage requirements – measured in GBs
  • Mean query response time – measured in milliseconds

About the Dataset

For this benchmark, we focused on a dataset that models a common DevOps monitoring and metrics use case, where a fleet of simulated servers are periodically reporting system and application metrics at a regular time interval. We sampled 100 values across 9 subsystems (CPU, memory, disk, disk I/O, kernel, network, Redis, PostgreSQL, and Nginx) every 10 seconds. For the key comparisons, we looked at a dataset that represents 100 servers over a 24-hour period, which represents a relatively modest deployment.

  • Number of Servers: 1,000
  • Values measured per Server: 100
  • Measurement Interval: 10s
  • Dataset duration(s): 4h
  • Total values in dataset: 144,000,000

This is only a subset of the entire benchmark suite, but it’s a representative example. If you’re interested in additional detail, you can read more about the testing methodology on GitHub.

Write Performance

InfluxDB outperformed OpenTSDB by 5x when it came to data ingestion.

write

On-Disk Compression

InfluxDB outperformed OpenTSDB by delivering 16.5x better compression.

on_disc_storage

Query Performance

InfluxDB outperformed OpenTSDB by delivering a minimum of 4.0x better query throughput.

query

Summary

OpenTSDB and InfluxDB are both time-series databases. However, OpenTSDB requires multiple orders of magnitude more administration overhead to run in production, and requires many application-dependent tweaks before it is performant. Even then, it was not entirely correct when running ad-hoc queries over our data. InfluxDB is simpler to set up, easier to use effectively, and returns correct query results.

In conclusion, we highly encourage developers and architects to run these benchmarks for themselves to independently verify the results on their hardware and data sets of choice. However, for those looking for a valid starting point on which technology will give better time-series data ingestion, compression and query performance “out-of-the-box”, InfluxDB is the clear winner across all these dimensions, especially when the data sets become larger and the system runs over a longer period of time.

What’s next

  • Downloads for the TICK-stack are live on our “downloads” page
  • Deploy on the Cloud: Get started with a FREE trial of InfluxCloud featuring fully-managed clusters, Kapacitor and Grafana.
  • Deploy on Your Servers: Want to run InfluxDB clusters on your servers? Try a FREE 14-day trial of InfluxEnterprise featuring an intuitive UI for deploying, monitoring and rebalancing clusters, plus managing backups and restores. 
  • Tell Your Story: Over 100 companies have shared their story on how InfluxDB is helping them succeed. Submit your testimonial and get a limited edition hoodie as a thank you.