This is the first in a series of detailed benchmarking tests comparing InfluxDB vs Elasticsearch, Cassandra, MongoDB and other databases for time-series data and metrics workloads.
At InfluxData, one of the common questions we’ve been getting asked by developers and architects alike the last few months is, “How does InfluxDB compare to Elasticsearch for time-series workloads?” This question might be prompted for a few reasons. First, if they’re starting a brand new project and doing the due diligence of evaluating a few solutions head-to-head, it can be helpful in creating their comparison grid. Second, they might already be using Elasticsearch for ingesting logs in an existing monitoring setup, but would like to now see how they can integrate metrics collection into their system and believe there might be a better solution than Elasticsearch for this task.
Over the last few weeks a few members of the InfluxData engineering and QA teams set out to compare the performance and features of InfluxDB and Elasticsearch for common time-series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. InfluxDB outperformed Elasticsearch in all three tests with 8x greater write throughput, while using 4x less disk space when compared against Elastic’s time series optimized configuration, and delivering 3.5x to 7.5x faster response times for tested queries.
It’s also important to note that configuring Elasticsearch wasn’t trivial – it requires up-front decisions about indexing, heap sizing, and how to work with the JVM. InfluxDB, on the other hand, is ready to use for time-series workloads out of the box with no additional configuration with a schema and query language designed for working with time series.
We felt that this data would prove valuable to engineers evaluating the suitability of both of these technologies for their use cases. Specifically, the time-series use cases involving custom monitoring and metrics collection, real-time analytics, Internet of Things (IoT) and sensor data, plus container or virtualization infrastructure metrics. The benchmarking exercise did not look at the suitability of InfluxDB for workloads other than those that are time-series-based. InfluxDB is not designed to satisfy full-text search or log management use cases and therefore would be out of scope. For these use cases, we recommend sticking with Elasticsearch or similar full-text search engines.
To read the complete details of the benchmarks and methodology, download the “Benchmarking InfluxDB vs. Elasticsearch for Time-Series Data & Metrics Management” technical paper or register for the webinar on May 19.
Our overriding goal was to create a consistent, up-to-date comparison that reflects the latest developments in both InfluxDB and Elasticsearch with later coverage of other databases and time series solutions. We will periodically re-run these benchmarks and update our detailed technical paper with our findings. All of the code for these benchmarks are available on Github. Feel free to open up issues or pull requests on that repository or if you have any questions, comments, or suggestions.
Now, let’s take a look at the results…
InfluxDB is an open-source time-series database written in Go. At its core is a custom-built storage engine called the Time-Structured Merge (TSM) Tree, which is optimized for time-series data. Controlled by a custom SQL-like query language named InfluxQL, InfluxDB provides out-of-the-box support for mathematical and statistical functions across time ranges and is perfect for custom monitoring and metrics collection, real-time analytics, plus IoT and sensor data workloads.
Elasticsearch is an open-source search server written in Java and built on top of Apache Lucene. It provides a distributed, full-text search engine suitable for enterprise workloads. While not a time series database per se, Elasticsearch employs Lucene’s column indexes, which are used to aggregate numeric values. Combined with query-time aggregations and the ability to index on timestamp fields (which is also important for storing and retrieving log data), Elasticsearch provides the primitives for storing and querying time-series data.
About the Benchmarks
In building a representative benchmark suite, we identified the most commonly evaluated characteristics for working with time-series data. We looked at performance across three vectors:
- Data ingest performance – measured in values per second
- On-disk storage requirements – measured in MBs
- Mean query response time – measured in milliseconds
Since Elasticsearch is a special-purpose search server and not intended for time-series data out of the box, some configuration changes are recommended by Elastic for storing these types of metrics. In our testing, we found that these changes:
- Did not have an impact on write or query performance
- Did make a difference in storage requirements
About the Dataset
For this benchmark, we focused on a dataset that models a common DevOps monitoring and metrics use case, where a fleet of servers are periodically reporting system and application metrics at a regular time interval. We sampled 100 values across 9 subsystems (CPU, memory, disk, disk I/O, kernel, network, Redis, PostgreSQL, and Nginx) every 10 seconds. For the key comparisons, we looked at a dataset that represents 100 servers over a 24-hour period, which represents a relatively modest deployment.
- Number of Servers: 100
- Values measured per Server: 100
- Measurement Interval: 10s
- Dataset duration(s): 24h, 48h, 72h, 96h
- Total values in dataset: 86,400,000 per day
This is only a subset of the entire benchmark suite, but it’s a representative example. If you’re interested in additional detail, you can read more about the testing methodology on GitHub.
InfluxDB outperformed Elasticsearch by 8x when it came to data ingestion.
InfluxDB outperformed Elasticsearch by delivering 4x and 15x better compression.
InfluxDB outperformed Elasticsearch by delivering a minimum of 3.5x better query performance.
Ultimately, many of you were probably not surprised that a purpose built time-series database designed to handle metrics would significantly outperform a search database for these types of workloads. Especially glaring is that when the workloads require scalability, as is the common characteristic of real-time analytics and sensor data systems, a purpose-built time-series database like InfluxDB makes all the difference.
In conclusion, we highly encourage developers and architects to run these benchmarks for themselves to independently verify the results on their hardware and data sets of choice. However, for those looking for a valid starting point on which technology will give better time-series data ingestion, compression and query performance “out-of-the-box”, InfluxDB is the clear winner across all these dimensions, especially when the data sets become larger and the system runs over a longer period of time.
- Download the detailed technical paper: “Benchmarking InfluxDB vs Elasticsearch for Time-Series Data & Metrics Management”
- Check out the video playback of the companion webinar.
- Download and get started with InfluxDB
- Schedule a FREE 20 minute consultation with an InfluxData Solutions Architect to review your InfluxDB project
- Looking for InfluxDB Clustering on your infrastructure? Contact Sales to get a demo of InfluxEnterprise plus pricing information.