Coming soon! Our webinar just ended. Check back soon to watch the video.
Webinar Date: 2018-06-05 08:00:00 (Pacific Time)
WayKonect is a Lille startup specialized in fleet management. Their focus is on data analysis and actionable intelligence from vehicles and they take a driver-oriented approach to the field. WayKonect sets itself apart by offering proactive tools that optimize all aspects of fleet management to maximize performance and minimize costs. In this webinar, Jonathan Schmidt, CTO and cofounder of WayKonect, will share how they use InfluxEnterprise gather data and turn them into insights to provide real-time coaching tips to help drivers improve fuel efficiency and safety.
Watch the webinar “How to Turn Data into Insights to Help Drivers Improve Fuel Efficiency & Safety” by filling out the form and clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “How to Turn Data into Insights to Help Drivers Improve Fuel Efficiency & Safety”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Chris Churilo: Director Product Marketing, InfluxData
• Jonathan Schmidt: CTO, WayKonect
Chris Churilo 00:00:00.702 All right. It’s three minutes after the hour, so we will go ahead and get started. And I’d like to introduce our speaker today. Jonathan Schmidt is from WayKonect, and he’s going to be presenting on their use case as well as his best practices. And as I mentioned, if you have any questions, please don’t be shy. Jonathan will be more than happy to answer your questions. So either post them in the Q&A, or the chat panel, or as I mentioned, raise your hand and we can meet you. You can ask your question directly. All right, Jonathan, why don’t you take it away?
Jonathan Schmidt 00:00:36.049 Thank you very much, Chris. Good afternoon, good morning, or other applicable greetings. I’m Jonathan Schmidt, CTO of WayKonect, and I’ll be presenting this InfluxDB webinar. Quick overview of the topics I’ll touch on today. First will be WayKonect in a nutshell, a very quick presentation of what we do and why we do it. Why the choice of InfluxDB, why we went with this technology, and, as it happens, why we’re very happy with it. Then a bit of a talk about schema, what to do, what not to do, and how we actually solve that thorny problem with WayKonect. Then I’ll touch on handling private information within the GDPR, that new European regulation about personal data and it’s handling, in a technical way, of course. And at the end, I’ll share a tip we actually developed for handling geotemporal data efficiently and without blowing up the cardinality count. And then, of course, Q&A at the end.
Jonathan Schmidt 00:02:01.158 First, WayKonect in a nutshell. So we are an intelligent fleet management solution. We use telematics trackers in cars to actually get telemetry about position, health, maintenance, and anything and everything that the onboard computer will actually tell us. And our specialty is data analysis. We are hardware-agnostic, so we connect directly to some cars. We use trackers when applicable. We can also use geolocation beacons, anything that actually gives us telemetry. We have a very driver-centric approach since drivers are responsible for more than 50% of any cost incurred for a vehicle, be it the way it’s driven, how much it’s driven, to gas behavior, maybe accidents sometimes.
Jonathan Schmidt 00:03:10.745 Our motto is enabling privacy and promoting transparency within an organization. Transparency about how a car is used, why it is used, and how it’s maintained. Privacy is very important to us, especially here in France and in Europe in general. So our solution has, as its core, end user can, of course, request not to be followed for geolocation at their own requirement. Despite not being able to follow them, we can still aggregate data on how much kilometers were driven, how the car was driven, and how much gas was consumed so that everybody in the company has a clear view or the clear picture of cars. That’s really one of the pain points we were trying to solve for companies because cars are very expensive both to buy or lease and to maintain. And despite that cost, accountability and information about them is very hard to come by and harder even to trust. You can, of course, ask your users to give you their mileage, and odometer, and trip information regularly, but most of them will either forget. Sometimes they will make mistakes in giving you a report, not out of malice, just out of—it’s so many numbers to type on. And when you input that into an Excel sheet, as 85% of our clients do, or in another IT system, whatever it may be, then you’ll have one in four numbers that will be written wrong, just typing error. So, yeah. Hard to get, harder to trust. And that’s the problem we try to solve. We also have a partner network that allows us to maintain cars cheaper. And, all in all, what we sell is $650 per year per vehicle in savings, that’s including the cost of our solution, by the way, and 50% less time spent on fleet management for the fleet manager. That’s our promise. So that’s it for what we do.
Jonathan Schmidt 00:05:58.214 Why InfluxDB? Well, we were one of the first companies on the market to actually do 1-hertz sampling on our vehicles. 1-hertz sampling means a lot of data coming in. That’s 1 hertz and between 10 to 30 or 40 columns, depending on the vehicle, and that’s a lot of data coming in. So at first, we were actually using flat files like CSG or our own version of CSV. Obviously, that didn’t scale well at all. Even with one file per vehicle per day, it was a nightmare to maintain and very hard to actually write and reread after that. Another thing that was actually important to us was the new sequel of the almost schemaless design of InfluxDB. Because being hardware-agnostic means we have a lot of variety in our data. A geolocation beacon will only give us maybe latitude, longitude, sometimes heading and speed. That’s about it. Whereas a full-blown telematics tracker will give you 20, 30, 40 data points every second about your vehicle. And you have to input all of them. And, of course, two telematics tracker won’t have the same columns, won’t have the same names. So you have to synchronize a bit, but also be able to add a lot of columns as you go.
Jonathan Schmidt 00:07:42.762 Another choice for us was the way Influx handles duplicates, handles overwrites of data, collisions. Because let’s face it, it’s industrial data collected by telematic system, some that are very stable, others a bit less so. And so you have missing data and you have duplicates all the time. It happens. Sometimes you will even have a tracker sending you completed data or added-value data afterwards on the timestamp that you had already recorded. So handling collision correctly was very important for us. And that’s something Influx does beautifully. If you have the same columns, you’ll just overwrite. If you have new columns and some of them are missing, you’ll just fill in the blanks. You don’t lose data by rewriting a line. That was important. And as a company dedicated to a thing that can run 24/7, high availability and scalability was very important. High availability mostly because cars run anytime, anywhere, and we are a SaaS solution, meaning people can access our portal at any time, literally. So we needed to be able to give them a highly available solution. Scalability because, let’s face it, we are a startup. We started with about 100 vehicles. We’re at a few thousands today. And our world map puts us about 10,000, and in the 100,000 bar within the next five years. So scalability really was the operating word here. And that’s why we actually chose InfluxEnterprise. And we have a cluster today set up. And our estimates put us at about 200,000 vehicles to actually start stressing the hardware on it, which is not top-of-the-line hardware. It’s entry-line hardware from [inaudible]. And so we have a good confidence that we can meet and exceed the scale that’s asked of us just by scaling the hardware up a bit and scaling out then after that.
Jonathan Schmidt 00:10:18.165 Few words about schema. The one rule about schema, or the one rule we actually try to follow here at WayKonect is one timeline means one series. So as collisions will overwrite data, what you really want is one physical device per series. That’s the first thing. And then sometimes you have duplicate messages on multiple channels, on which a vehicle can handle messages. And then you divide by channel. That’s what we did. Tracker first, which is a unique identifier for a telematic solution, and the message type, which is our take on message channels. Of course, that’s mostly so we can track metrics across multiple channels and actually see if one channel is doing poorly. We put the only operating trackers and, with some correct field management, we could reduce that cardinality. That’s something we are actually looking into.
Jonathan Schmidt 00:11:36.369 Cardinality, that’s something that’s really important. With Influx, for scalability, cardinality is key. And it doesn’t necessarily mean number of tags. You can have 100 tags to organize your data. If these hundreds of tags translate to a few thousand series, 10,000 series, you’re good. If those hundreds of tags and their permutations gives you 1 million, 10 million, 100 million or more series, then you’re going to run into trouble. And then you’re going to need very, very powerful hardware to run it. So, really, when designing schema, it’s not the number of tags that should be taken into account but the variability of tags. And that’s the important part.
Jonathan Schmidt 00:12:31.314 A second thing to remember for it is Group By. You can group by tags. You cannot group by fields. So whenever you know you will have to group by something, you’ll have to put it as a tag. If it has cardinality, then you have to weigh the tradeoff. If it doesn’t have cardinality, you’re good. No questions asked, just put it into a tag. You can Group By for it. One thing to remember for that, you have to plan ahead. That’s one thing. Take your time with your first schema to actually get it right for now, but get it right also for tomorrow. Try to anticipate because changing a tag plan can be really painful. The fact is that you’ll have to basically forklift your entire data set into the new tags. If it’s adding tags, that’s more okay. You can just add tags for the new data. And don’t forget not to include it in the workflows when you want the older data. But you still need to completely transform the way you tag. That’s going to be a lot of forklifting. So plan ahead.
Jonathan Schmidt 00:13:55.622 Little thing about that and about retention policies, which we’ll touch more with privacy and GDPR, if you try to write outside of a retention time of a retention policy—for example, if your retention policy is one month and you try to write data points two months old, you will get a write error. Some drivers handle it gracefully, some don’t. I know for a fact that C# driver doesn’t. One of the C# drivers doesn’t. That’s not typically much of a problem, but it can have you scratching your head and wondering why your implementation just doesn’t work and refuses to write data on your development environment, where the retention is small, but it does work on the production environment where it’s longer. Well, just look no further. And that’s actually something good to have in your code base because it’s one less problem to worry about and one less thing that can go wrong if someone, at one point, feeds false data into a database.
Jonathan Schmidt 00:15:19.560 Handling privacy. As you may know, on the 25th of May of this year, a new general regulation went in effect into the EU, and anybody handling personal data from anyone in the EU needs to comply. What it means is people have a right to be forgotten. So that’s something to keep in mind. And you can only keep the personal data for so long. Now, this doesn’t apply to non-personal data. For us, for example, position, geolocation, latitude, and longitude, is something that is very private. So we are limited to a two-month window period where we can keep them, and then we have to destroy them. But that’s not the case for, say, speed, how much gas you have in your tank, what’s your RPM in one second. These things aren’t private. So these things fall under another policy and they can be kept for statistical analysis. So with that, retention policies are actually awesome. When you create a retention policy, you say that data written under it will be automatically deleted at a set time. And Influx will actually do that for you. You have nothing to worry about. Influx will tombstone and delete those files as it goes about its business. One thing to keep in mind, though, is the—and that’s linked into schema. It’s the private information you might have and the right to be forgotten of the regulation. What it means is, you have to be able to completely delete a series regarding a user or a series regarding that user’s personal information. That means that your schema and your tags must reflect this requirement. Unless you want to delete multiple users’ information at once, you have to give every user his or her own series when you create your schema, if you want to use retention and the right to be forgotten. If not, then you might have to comply with the wishes of one user to delete multiple ones, and that’s probably not something most of us want to do.
Jonathan Schmidt 00:18:12.000 Another thing, we have finite timings on how long we can keep data. The correct retention time for information in Influx is your retention period, say one month, plus your shard time. Say I have one-month retention policy with a shard time of one day, Influx will actually create one shard every day for a month. And then for the 31st day, it’s going to create a new shard and delete the older one. So, basically, what you have is your retention plus the shard time. Usually, that’s not much of an issue except when you have very long retention policies. And Influx actually recommends having a longer shard time. You have to take that into account into your policy. Shard time is actually important for performance, so don’t take that advice as meaning you should reduce it to the minimum. That is a bad, bad idea. You will blow up the [iterate?] account and it will basically make your database slow down to a crawl. Every query would be slower because Influx will have to address multiple shards to give you an answer. That’s slow. So you have to strike a balance somewhere between shard time, retention time, and your requirements, and the privacy.
Jonathan Schmidt 00:19:50.458 One other thing not to do is inflate shard time and have a very small retention period so that you can keep data and delete it at the right point but you don’t have too much shard count. If you do that, then you can write data at one point. And once it’s outside your retention policy, as I said, you’re going to get an error. So it might work for, say, things like metrics, transaction information, etc., but if you ever hope or have to handle all the data points being input into your system, it’s going to fail. So that’s something you really have to keep in mind. One thing of note, though, is that queries across retention policies are possible. So even when you transform your schema and your system to handle privacy, usually, have only a small adjustment to make to your queries to add the retention policies after your FROM clause. You just add a comma, add the new retention policy, and if you switched fields from one to the other, you shouldn’t have too much trouble. Influx might send you back two series into a form, but that’s something that your driver usually can handle.
Jonathan Schmidt 00:21:18.219 Another thing about timing, you have watch out for backups. Backups are covered into the GDPR, and sometime conflicts with it because, as a tech company, you are required by law, in France, for example, to keep a journal of transaction, and what was done on your system for a certain time, and to keep backups of your users’ data. But these backups also fall into the GDPR and privacy timings. So when you do backup InfluxDB, make sure either not to back up the privacy data or to back it up in a way that is on a separate file that you can delete specifically once the time is up. If you back up Influx on a standard one year, one month, one week, and everyday policy, you run into problems when you want to delete some of the data in the backup but not the other. And having that data is actually an offense under the new regulation. So that’s really a key point to keep in mind. Okay. So that’s it for legalese.
Jonathan Schmidt 00:22:33.437 On to the fun part: geotemporal data. What is geotemporal data, and why is it hard to—why does it require a tip or trick to handle well into Influx? Geotemporal data is anything that evolves over time on a fixed point and can be measured on multiple points on earth. Let’s give a few examples because that sounds completely obscure. Meteorological information like wind, rain, temperature. You can have traffic. You can have seismic activity. All these things. For this, usually, you want to have series that map out to places on earth. So you can index the place and you can index the time, thanks to Influx. Single-value geodetic systems make good tags. What is geodetic? Geodetic just means a coordinate system. So latitude, longitude is a poor tag to use for Influx because you’ll have two tags and a very, very high cardinality. You have a few single-value geodetic systems out there like geohash or the MGRS, the Military Grid Reference System. This allows you to encode location data into one string. And this allows you to create tags and index them correctly in Influx. It’s actually much more efficient than using latitude and longitude. So geotemporal data, the biggest trouble is cardinality. When you happen to have a lot of sensors all over the world, or you need to handle a lot of traffic and ride information, like us, you’re going to have lots of points on earth. And so, if you just take a geohash, we can be up to 12 characters long, and put that as your tag, just no questions asked, you’ll end up with between 1,000 and 1 million trillion series. Let’s just say that’s a bad idea. Influx will definitely not like something like this.
Jonathan Schmidt 00:25:21.042 So let’s take the geohash to see a bit what we’re speaking about. Geohash is a way to encode into a string of 32 different characters, a location. So, usually, the more characters you had on the string, between 1 and 12, the more precision you have. A one-character geohash is about 5,000 square kilometers large on earth. So, basically, you can probably say it’s somewhere on the continent, and that’s about it. The more you add characters, the more precise you get. When you’re down to around five characters, you are at five square kilometers. It’s starts to be an interesting point. Then when you get into that, you can get to submeter or a few millimeters of precision. Basically, a postal stamp. So 12 characters is a postal stamp. However, the more characters you had, the more cardinality you will run into. At five characters, you are already at 33.5 million series. As I said, that’s a lot, and it’s going to stress your hardware. So what can you do? What can you do to actually encode data? Well, there are two possible tricks. Either you have data that comes from everywhere on earth, but not very often. Then you can just limit your tag size to something suitable, like four, five characters, tops, and you randomize any time component that is outside what you need for precision. Let’s say you have 1-second precision—you have data coming every second—well then, everything from a second to the nanosecond, so from 10th of a second to the nanosecond is actually data you will never need. But Influx indexes them anyway. So why not use them? Well, you can just input a random number, which is somewhere between zero and one billion minus one, and input that as the timestamp for your data. That way you can have a statistically insignificant chance that two data will collide and you will lose one of the two. Or, you, as before, encode as much data in the tag as the tag will allow, and then you decide on the part of that timecode you don’t need, like maybe everything under a second you will never use, everything under a minute you will never use, and you can encode, deterministically, the rest of the geohash, or as much as you can into that part.
Jonathan Schmidt 00:28:44.276 So if you say, let’s say, one-second precision, you have between zero and one billion minus one, that’s about 29 bits of data that you can corrupt to index the rest of your geohash. Roughly, 29 bits. Unfortunately, one character is five bits. So that gives you five character of precision. Four plus five or five plus five is actually very nice precision. But if you don’t need the second, if you’re just down to a minute, which, let’s face it, for meteorological or traffic data is more than enough, then you can get down to one minute, and everything after that is location data, which give you 35 bits to play with, which gives you seven characters. Combined with the first five that you encode on the tag, it gives you 12 geohash character and a precision of a postal stamp. So that’s our trade. That’s actually how we do it in production. We encode five characters. That’s, theoretically, 33 million series. But during rough calculations, land mass on earth is 30% of the total area on earth. So you have about 10 million series left. And of these, roads will typically represent less than 10%. So we are under the million. That’s why we chose five characters. Five characters, one minute. 12 characters, we can have full precision on our tags, or almost full precision on our tags and a one-minute time precision.
Jonathan Schmidt 00:30:50.526 My advice on that, also add a field with a complete geohash to your query. That way, you can ask for tags, and then you can actually use WHERE clause with either string equality comparison, or just a simple regex to have string that starts with something. That way, you can aggregate data on 1 geohash square down to 1 character, down to 12. Another thing we did is to act—instead of putting one tag with the full five characters of geohash, we actually separate this in five tags, so that we can query each character separately. That way, you have full control of a query and the support of the InfluxDB indexing engine. That’s it for the geotemporal data and for the information on that webinar. I’m open to any questions you may have. So don’t hesitate.
Chris Churilo 00:32:14.574 All right. Wow, that was cool. I really love that example that you gave of stuffing that geotemporal data into the DB. I have a question. But before I ask my question, I do want to remind everybody, if you have a question for Jonathan, please post your question into the chat or the Q&A panel and we’ll definitely get to those. In the meantime, I wanted to go back a little bit on the privacy slide, about the retention policies. So you had mentioned that there are laws in France about backing up the user data as well?
Jonathan Schmidt 00:32:54.959 Yes. There are a few regulations. Mostly, they are about journals. They are about logs of who did what on your IT system. Some applications and some fields will also require backups.
Chris Churilo 00:33:16.402 Okay. And—
Jonathan Schmidt 00:33:17.672 Especially if you are touching financial data or HR-related information, which we kind of do.
Chris Churilo 00:33:28.796 I see. Oh yeah, I guess that makes sense because these are employees—
Jonathan Schmidt 00:33:33.472 Exactly.
Chris Churilo 00:33:33.975 —of your customers. Okay. So that makes sense. And I thought that was really good advice to split out your data that you think that the users would want to delete into a separate series because then, yeah, that makes it completely easy to take care of it. Something that I hadn’t really thought about.
Jonathan Schmidt 00:33:53.582 Or then to automate it with your web portal.
Chris Churilo 00:33:56.879 Yeah. I thought that was really cool—
Jonathan Schmidt 00:33:57.649 [crosstalk] to be forgotten, you can just do it with one query on the DB, and it will take care of everything for you.
Chris Churilo 00:34:05.487 Yeah. And I guess the nice thing is you—the only reason that you guys are using it is to help with your solution. And I guess after so much time anyways, it’s probably not that much use to you in its raw form anyways. So it makes sense to just delete it. And then the good thing is that you’re adhering to the regulation as well.
Jonathan Schmidt 00:34:29.438 Indeed.
Chris Churilo 00:34:31.492 So then I do want to talk a little bit more about this geohash that you had, looked like, two different versions of ways of handling it.
Jonathan Schmidt 00:34:42.918 Yes. Actually, the first one was a tip that was given to me by Ed Bernier when I actually asked him about high-series cardinality and geohashing 200 geotemporal data. Basically, if you randomize any part of the timing precision you don’t need, then you can input a lot of data, potentially, up to 1 billion points of data into one second, and have them handle the—with the one field, have them handle the positioning data. It’s a good approach. But as your number of write increase, your statistical possibility of a collision increases as well.
Chris Churilo 00:35:33.507 So—
Jonathan Schmidt 00:35:34.822 So it’s a good [inaudible].
Chris Churilo 00:35:36.385 So just for the audience, when would you recommend going with the first method versus the method you guys ultimately went with?
Jonathan Schmidt 00:35:50.035 First method—
Chris Churilo 00:35:50.815 Do the pros and cons for both.
Jonathan Schmidt 00:35:53.107 Well, for example, if you have wind data or weather data, and your squares are maybe five square kilometers or even less wide, then it’s a good way because even if you encode the full precision of your weather station in it, you won’t have a report every minute. So if your rate of ingestion—if number of data points you treat is lower, then actually randomizing might be easier on the system. But, honestly, in most cases, I would recommend encoding as much data as you can into the timecode, unless you need a very high degree of precision, both on time and position. If you want sub-second precision with 10, 11, 12 characters of geohash, you don’t have the means to actually encode all of it into tag plus time. Though in that case, it makes sense to randomize anything in your time that is under the precision you want, and input data that way. But this approach will have its limits.
Chris Churilo 00:37:31.658 That makes sense. Cool. We do have a question in the Q&A. Let me read it out loud. So do you use user IDs as tags?
Jonathan Schmidt 00:37:42.916 That’s a tough one. As a rule, no. Using user ID straight into your database might be a problem security-wise because then you expose some data into your system to the outside or even to the inside. If you can have a single ID for your user, and it follows him everywhere, it’s kind of like identifying information. It’s kind of like putting his name or his email into the tag. And that’s, usually, not a good practice. What I recommend for that is generating into your primary DB, a single ID for your user. GUID works nice for that. Completely random GUID, and use that to make the transition between your main DB system and Influx.
Chris Churilo 00:38:43.160 Okay. And then he asks a follow-up question. If you were to use user IDs as tags, he was wondering, if so, wouldn’t that greatly increase cardinality, especially coupled with geotemporal data. And then also asking, approximately how much cardinality do you deal with and what cluster sizing did you find is required to handle this?
Jonathan Schmidt 00:39:05.173 Okay. Handling geotemporal data and user ID really will create a lot of cardinality. When we deal with geotemporal data, usually, we use fields. We use tags only for the positioning, and we use fields for everything else. You can, maybe, split into measurement, but even that will greatly increase your cardinality. That is not something you want. As for what we deal with, currently, we use two series per tracker for the telemetry information and about two series per tracker for metrics. We have a few thousand trackers today, so our cardinality, including our sandbox environment, should be under 50,000. This currently works well with double-core CPUs and about 8 or 16 gigabytes of RAM. It’s a two-node cluster. This was tested up to a cardinality of 100,000, 200,000. About that, I would recommend going for RAM, going for memory first. If you want to query a lot or if you have high cardinality, then memory will be your bottom match. If you query a lot, then CPU will be. Basically, if you write something and if you write lots of series, you’ll need memory. If you read a lot, you are going to need CPU and very fast disks.
Chris Churilo 00:41:06.988 All right. So I want to make sure—let Jonathan know if that answered your question and if you have a follow-up question. And he says, “Cool. Thanks.” Okay. Cool. We have another question. So since InfluxDB scales well, is it freeing the company up to focus on growth rather than worry about how to scale?
Jonathan Schmidt 00:41:28.243 Indeed. That was one of my primary concerns when I chose the technology for time series. And it’s also why we chose our other two technologies to handle our data. We chose a graph database, which is called Neo4j, and Kafka as a messaging backbone for the infrastructure. That really was one of the main points of our choice, scalability. Particularly as a startup, you need to not worry about scalability. Think about it when you choose your technology because changing technology is really hard. So keep it in mind when you choose it for the first time.
Chris Churilo 00:42:13.812 So since we have a little bit of time, I do want to ask, what does your data architecture look like? So you’re collecting the telemetry from the cars, and then it’s going—how does it get into the system?
Jonathan Schmidt 00:42:28.246 It’s bufferized on the trackers themselves, usually. For car manufacturers, it’s usually transit via their servers and ends up on a standard HTTP endpoint outside. For most trackers, usually, you bufferize and send an HTTP call, maybe a POST call, maybe every minute or two minutes. This call writes straight to Influx and to Kafka. It writes all the data to Influx and what we need to calculate metadata like [inaudible] into Kafka. And we have a few workers that read the data from Kafka and input it into Neo4j, our graphs base, which holds the vehicles, the drivers, the trips, the events when we have something special to note like a driver identification, or an accident, or something like that. This also tracks maintenance. So, basically, every user and metadata is in Neo4j, and the workers and the SaaS console actually write to it. Everything else, every raw telemetry, is on Influx. As well as positions, and this is why we actually needed retention policies on Influx.
Chris Churilo 00:43:59.924 Which is, once again, the private data.
Jonathan Schmidt 00:44:03.348 Exactly.
Chris Churilo 00:44:05.250 Cool. If there are any other questions, feel free to post them in the chat Q&A. Just want to remind everybody, this session has been recorded and the automated email will be posted to you tomorrow. And if you’re keen to listen to it again, then just go ahead and go to the same URL that you registered for this webinar. And this happens quite often, so a lot of times people will find that they do have questions later on. So feel free to email me. You guys have my email address with the registration details. So I will be more than happy to forward your questions to Jonathan. Jonathan, this was very interesting. Especially the geotemporal data, I think we’re going to get a lot of questions about that because I know we get a lot of inquiries about how people can add that information because it is important. But I do appreciate that what you presented today was also a good balance with privacy, because as cool as geotemporal data is, we do have to consider the privacy of the person that’s often being tracked with that data. So I really appreciate that balance that you presented to everybody today.
Jonathan Schmidt 00:45:21.778 Thank you very much. And yeah. Feel free to send questions. I’ll be happy to answer them.
Chris Churilo 00:45:26.981 Okay. Here, we have another question. Cool. Okay. Do you use InfluxDB for any kind of alerting monitoring?
Jonathan Schmidt 00:45:33.817 Yes, indeed. As I said, we collect metrics not only about the trackers themselves, but about our architecture, so our virtual machines, our work threads that read from Kafka. And most of our infrastructure actually feeds metrics data into Influx, and we then use Grafana to dashboard and visualize it.
Chris Churilo 00:46:00.998 You also have dashboarding that’s part of your solution, and that’s using this other graphing tool that you were talking about, right?
Jonathan Schmidt 00:46:11.484 We have some visualization and graphing going on in the solution. But, usually, it’s completely inside-developed. The real data from Influx that we show to our user is positioned to show them the trips, the routes the vehicle took. So that’s about the only way we need. And this is overlaid on the map, OpenStreetMaps or other standard, on the screen as a user. So we don’t really present Grafana dashboards to our user. We had one use case where we could have done it for vehicle maintenance because we had a garage that actually asked us to give them telemetry data, health indications about the vehicle to better do maintenance. Then, we considered sharing some dashboards with them. But as a rule, we usually developed our own visualization tool when it’s for our users.
Chris Churilo 00:47:21.712 Yeah. That’s pretty standard with a lot of our users. So a lot of times, it’s going to be two different sets of visualization tools, one for internal use and one for tying it with the solution that you’re building to provide your users with some kind of a view. Oh, this was really great. I love it. I think we all kind of geeked out on it. And thank you so much for your time and for sharing this great, great, great tips. I also think at the beginning of your presentation, I kind of felt like maybe I should steal that for our trainings because I don’t know how many times I tell people, “Be careful. I know it’s easy to get started with InfluxDB, but please do be careful with the data that you put in because it can bite you. It is schemaless, but it can bite you if you put in your data in a field or a tag when it was supposed to be the other.”
Jonathan Schmidt 00:48:14.322 Indeed. And if you do, be sure to include the fact that you absolutely need every tags to be strings and nothing else, and that if you input a field with one type of data, just don’t include another type of data on the same field in the same chart. That kind of blows up.
Chris Churilo 00:48:36.374 Awesome. All right. Thanks, everybody. And I know we’ll probably get questions later on. And Jonathan, once again, thank you so much, and we look forward to chatting with you again.
Jonathan Schmidt 00:48:48.582 You’re welcome. It was my pleasure to do this webinar.
Chris Churilo 00:48:52.529 Bye, guys.
Jonathan Schmidt 00:48:54.652 Bye.
Track and graph your Aerospike node statistics as well as statistics for all of the configured namespaces.
Knowing how well your webserver is handling your traffic helps you build great experiences for your users. Collect server statistics to maintain exceptional performance.
Collect and graph performance metrics from the MON and OSD nodes in a Ceph storage cluster.
Use the Dovecot stats protocol to collect and graph metrics on configured domains.
Easily monitor and track key web server performance metrics from any running HAProxy instance.
Gather metrics about the running Kubernetes pods and containers for a single host.
Collect and act on a set of Mesos statistics and metrics that enable you to monitor resource usage and detect abnormal situations early.
Gather and graph metrics from this simple and lightweight messaging protocol ideal for IoT devices.
Gather phusion passenger stats to securely operate web apps, microservices & APIs with outstanding reliability, performance and control.
The Prometheus plugin gathers metrics from any webpage exposing metrics with Prometheus format.
Monitor the status of the puppet server – the success or failure of actual puppet runs on the end nodes themselves.