InfluxDB is 2.4x Faster vs. MongoDB for Time Series Workloads
By Chris Churilo / Dec 18, 2018 / InfluxDB, Community
This blog post has been updated on December 18, 2018 with the latest benchmark results for InfluxDB v1.7.2 and MongoDB v4.0.4. To provide you with the latest findings, this blog is regularly updated with the latest benchmark figures.
At InfluxData, one of the common questions we regularly get asked by developers and architects alike the last few months is, “How does InfluxDB compare to MongoDB for time series workloads?” This question might be prompted for a few reasons. First, if they’re starting a brand new project and doing the due diligence of evaluating a few solutions head-to-head, it can be helpful in creating their comparison grid. Second, they might already be using MongoDB for ingesting data in an existing application, but would like to now see how they can integrate metrics collection into their system and believe there might be a better solution than MongoDB for this task.
Over the last few weeks, we set out to compare the performance and features of InfluxDB and MongoDB for common time series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. InfluxDB outperformed MongoDB in all three tests with 2.4x greater write throughput, while using 20x less disk space, and delivering 5.7x higher performance when it came to query speed.
To read the complete details of the benchmarks and methodology, download the “Benchmarking InfluxDB vs. MongoDB for Time-Series Data & Metrics Management” technical paper or watch the recorded webinar.
Our overriding goal was to create a consistent, up-to-date comparison that reflects the latest developments in both InfluxDB and MongoDB with later coverage of other databases and time-series solutions. We will periodically re-run these benchmarks and update our detailed technical paper with our findings. All the code for these benchmarks is available on Github. Feel free to open up issues or pull requests on that repository or if you have any questions, comments, or suggestions.
Now, let’s take a look at the results…
InfluxDB is an open source time series database written in Go. At its core is a custom-built storage engine called the Time-Structured Merge (TSM) Tree, which is optimized for time-series data. Controlled by a custom SQL-like query language named InfluxQL, InfluxDB provides out-of-the-box support for mathematical and statistical functions across time ranges and is perfect for custom monitoring and metrics collection, real-time analytics, plus IoT and sensor data workloads.
MongoDB is an open source, document-oriented database, colloquially known as a NoSQL database, written in C and C++. Though it’s not generally considered a true time series database per se, its creators often promote its use for time series workloads. It offers modeling primitives in the form of timestamps and bucketing, which give users the ability to store and query time series data.
About the benchmarks
In building a representative benchmark suite, we identified the most commonly evaluated characteristics for working with time-series data. We looked at performance across three vectors:
- Data ingest performance - measured in values per second
- On-disk storage requirements - measured in bytes
- Mean query response time - measured in milliseconds
About the dataset
For this benchmark, we focused on a dataset that models a common DevOps monitoring and metrics use case, where a fleet of servers are periodically reporting system and application metrics at a regular time interval. We sampled 100 values across 9 subsystems (CPU, memory, disk, disk I/O, kernel, network, Redis, PostgreSQL, and Nginx) every 10 seconds. For the key comparisons, we looked at a dataset that represents 100 servers over a 6-hour period, which represents a relatively modest deployment.
- Number of servers: 100
- Values measured per server: 100
- Measurement interval: 10s
- Dataset duration(s): 24h
- Total values in dataset: 86M per day
This is only a subset of the entire benchmark suite, but it’s a representative example. If you’re interested in additional detail, you can read more about the testing methodology on GitHub.
InfluxDB outperformed MongoDB by 2.4x when it came to data ingestion.
InfluxDB outperformed MongoDB by delivering 20x better compression.
InfluxDB outperformed MongoDB by 5.7x when it came to query speed.
The benchmarking tests and resulting data demonstrated that InfluxDB outperformed MongoDB in data ingestion and on-disk storage by a significant margin. Specifically:
- InfluxDB outperformed MongoDB by 2.4x when it came to data ingestion
- InfluxDB outperformed MongoDB by delivering 20x better compression
- InfluxDB outperformed MongoDB by delivering 5.7x better query performance
It’s also important to note that configuring MongoDB to work with time series data wasn’t trivial. It requires up-front decisions about how to structure your collections and data types, which can be very time consuming and will have long-lasting impacts on how you can interact with your data and what types of queries you can run. InfluxDB, on the other hand, is ready to use for time series workloads out-of-the-box with no additional configuration.
In conclusion, we highly encourage developers and architects to run these benchmarks themselves to independently verify the results on their hardware and data sets of choice. However, for those looking for a valid starting point on which technology will give better time series data ingestion, compression and query performance “out-of-the-box”, InfluxDB is the clear winner across many dimensions, especially when the data sets become larger and the system runs over a longer period of time.
- Download the detailed technical paper: "Benchmarking InfluxDB vs. MongoDB for Time Series Data, Metrics & Management".
- Check out the video playback of the companion webinar.
- Download and get started with InfluxDB.
- Join the Community!