For those that don’t know, InfluxData has been a member of the Eclipse Foundation since late last year. We joined because it turned out that a few of the Eclipse IoT Projects were already based on, or used as their underlying datastore, InfluxDB. Who knew? That’s often the way things go with an open source product. You don’t always know who’s using it, and for what, until they come to buy a license—and Eclipse wasn’t likely to be doing that.
But that has little, if anything, to do with why I’m writing this post. You see, the Eclipse Foundation’s annual IoT Developer Survey results came out this week. We’ve been waiting for them with eager anticipation, and they did not disappoint! Let me start off by saying that there weren’t really any shocking revelations in the results—at least for me. If you’re interested in the entire report, please feel free to go read it here. I’m going to run down some of the findings that I find especially interesting as they relate to InfluxData here. You can also see results from 2017, 2016 and 2015 surveys for the larger trends. There will also be a webinar on Thursday, April 19th which you should definitely not miss.
I was interested to see that Google was not only failing to gain momentum but was actually losing ground to Azure and AWS. It’s not often that Google enters a market and doesn’t gain at least a prominent market share. That being said, it was really no surprise to see AWS dominating the field with 24.1%. We here at InfluxData are part of the AWS Partner Network and have a strong relationship with AWS.
For the first time, one of the top concerns of IoT developers is actually data collection and analytics—up over 50% from last year to 18%. Security still tops the list, as well it should! We here at InfluxData are happy to see more IoT developers beginning to focus on data collection and analytics as an important concern for any IoT deployment. I’ve been saying for well over a decade that if you’re not actually taking action based on the IoT data you’re collecting, you might as well not collect it, and it’s good to see the wider IoT community taking that up as well.
It would also seem that connectivity is beginning to be seen as a ‘solved’ problem for many developers, as is interoperability. I’ve been doing IoT for a very long time, and seeing the trends of connectivity and interoperability as key factors declining is a welcomed sign. I see the increase in concern over data collection and analytics as a sign of opportunity for InfluxData since I’ve long seen InfluxData as the best, easiest, and most flexible data collection and analytics platform for IoT.
I did see the decrease in use of HTTP as an IoT messaging protocol as a bit of a surprise, but the continued dominance of MQTT was not. I’m pleased that with the growth of both MQTT and AMQP as IoT messaging protocols, InfluxDB—through Telegraf —supports both. As these protocols expand, look for us to expand our support for them natively. I also noticed the apparent loss in popularity of CoAP as well. Web sockets seems to be the new darling though, coming out of nowhere to gain a #3 spot.
You’ve been waiting for me to get here, haven’t you? I know I have! The section on IoT Databases was of particular interest to me. The notion that a traditional SQL database like MySQL is still dominant is not all that surprising if you think about it. Time Series Databases are a relative newcomer to the field and many developers simply use the tools with which they are most familiar. SQL databases have been around forever, so many developers are likely just more comfortable using them. The popularity of MongoDB for IoT still baffles me. I have, myself, built and deployed IoT projects on MongoDB because I didn’t know any better. MongoDB was relatively easy to set up and configure—though getting a front-end on it took a lot of coding. But then I found Time Series Databases and I’ve never looked back.
One of the reasons why people use MongoDB is the ease of setup, but I’d challenge anyone to get MongoDB setup, and a dashboard of system statistics up and running in a browser in under 5 minutes. I did exactly that with InfluxDB. I guess that could be viewed as a challenge, so if you see it that way, feel free to try to prove me wrong.
But the main reason I scratch my head over using MongoDB for IoT is sheer performance. I went on a bit of a Twitter rant about this yesterday, but I’ll go into it here as well, in case you’re not following me on Twitter—you really should be! Go follow me now! Here are a few performance numbers to keep in mind when considering MongoDB for IoT:
Configuration: AWS c4.4xlarge: Intel Xeon E5-2666 v3 2.9GHz, 16 vCPU, 30GB RAM, 1x EBS Provisioned IOPS SSD 120GB
|Database||Ingestion Rate (values/sec)||Query duration||Query/sec||Size on disk|
For those of you that don’t want to calculate this for yourself, that’s 57x faster ingestion, 11.5x faster query speed and queries/sec, and nearly 100x lower disk usage for InfluxDB vs. MongoDB. Now, explain to me precisely why you’re still going to choose MongoDB, please. Maybe for your sensor metadata, maybe. But for your sensor data? Your time series sensor data? Nope.
Now, for the chart:
The number 3 spot is a great place to be right now, especially considering where we are in relation to the next most popular TSDB. But we should be well ahead of MongoDB and gaining ground on MySQL. Ever since I was a kid, I’ve always been told: “Use the right tool for the job!” Don’t use a screwdriver to pound a nail. (I admit I’ve done this one though, in a pinch.) So don’t use a general-purpose transactional database for time series data. And for sure don’t use a document database for time series data. That’s like using a screwdriver to drive a railroad spike, and even I wouldn’t try that! Sure, you could do it, but it’s going to take 10 times longer and require 100 times more effort.
New motto: Friends don’t let friends use MongoDB for IoT data!
If you’re interested in seeing the full results of the survey and not just taking my word for it, feel free to check out the SlideShare of it. If you’re interested in crunching the raw numbers yourself because you’re just that kind of data geek, you can download the raw results as a Google Spreadsheet. If you’re interested in hearing Benjamin Cabé discuss the results with Mike Milinkovitch, then quick! RSVP for the Webinar on Thursday April 19th. If you’re reading this after that date, the link will also take you to the recorded version, so don’t miss it!