Industrial IoT | Live Demonstration
Session date: Jul 13, 2023 07:00am (Pacific Time)
In this live technical demo, discover through real-life use cases how industrial, energy and manufacturing use InfluxDB in their production-ready environments to support predictive analytics, improve overall equipment effectiveness, and reduce downtime.
Join our live demo to learn how InfluxDB can help you:
- Provide real-time monitoring of operational technology devices at scale
- Improve overall equipment effectiveness & proactive operations
- Deploy intelligent edge by connecting real-time edge and having a single pane of glass to gain visibility on all the edge data
- Lower storage costs & TCO and access historical data without sacrificing part of your analysis
- Connect with machine learning and data science tools to gain better insight and build the foundation of your industrial analytics.
Watch the Webinar
Watch the webinar “Industrial IoT | Live Demonstration” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Industrial IoT - Live Demonstration”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
- Jay Clifford: Developer Advocate, InfluxData
- Ben Corbett: Solutions Engineer, InfluxData
Jay Clifford: 00:00:06.752 Awesome. So we can see some people slowly coming in. We’ll get started in about a minute, just let some more people pile in. What we’d like to do in the chat really quickly is if you could tell us where you’re from, that’s always an exciting one to know. Both myself and Ben are from the UK, me up in Scotland, the north. Ben, what about you?
Ben Corbett: 00:00:27.865 I’m down in London today. Yeah, East London.
Jay Clifford: 00:00:32.473 So north and south. We actually have someone from Kiev in today as well, really cool. Right. I think, so we’re bang on time. We will get cracking. So I’ll do a quick introduction and then I’ll let Ben take it away. So thank you all for joining us today. Right before we get started, I would like to remind everyone of some housekeeping. So this webinar is recorded and will be shared and available on demand in the next 24 hours along with the slides. So any links that you see in the slides, don’t worry, you’ll get them after. If you have any questions, make sure you use the Q&A at the bottom of your screen and we’ll get your questions there and we’ll answer them all right at the end. If you have any more questions, make sure you join our InfluxDB Slack community and Discourse. Tons of Influxers and other community members in there always asking questions. So please jump in there and we’ll get them answered for you as well. So I am extremely excited to introduce our customer-facing solutions engineer, Ben Corbett. Without being biased, you might actually be my favorite solutions engineer, but yeah, we won’t tell anyone else that. So, Ben, the floor is yours. Why don’t you tell us a bit about yourself and what you’ve been working on?
Ben Corbett: 00:01:53.257 Thanks very much, Jay. That’s an incredible bar, impossible to live up to. So yeah, fantastic to meet you all. So for those of you that haven’t met me before, my name is Ben Corbett, I’m a solutions engineer at InfluxData. So essentially, what that means is my role is to make sure that InfluxDB is going to be a good fit for your use case, looking through best practices and making sure that you’re going to be getting the most out of your solution. So I actually am incredibly passionate about this subject, so I was really happy to be leading this webinar for you. I come from a long background in IoT application development and platform development. So a lot of industrial IoT, manufacturing, agriculture, construction, energy, and even electric vehicles. And before I joined InfluxDB, it started to be central to a lot of the platforms that I was developing. So yeah, I came over to join these guys about a year and a half ago, and I’m really excited to see that I’m almost exclusively working on IoT projects for InfluxData. So although we’re a metrics database that handles a lot of different things, I’m really happy to see how well we fit in this particular segment. So yeah, I’m going to share with you some of our specific capabilities in this area.
Ben Corbett: 00:03:06.879 So yeah, the agenda today is — I’m going to start off with a tiny little bit about InfluxDB, hopefully you know kind of a little bit about us before. I’m going to be digging into some version 3.0 functionality, so some specific kind of — what were the problems we tried to solve with 3.0? And what are the capabilities that map to that? Then I could be digging into a one specific use case with one of our customers, Teréga. So if you guys didn’t catch the webinar that Teréga led a couple of days ago, we’re going to be putting a link into the chat here. So please catch up on that. Really cool stuff that they’re doing with our platform. And then I’m going to be jumping into the demo. So we’ve got a locally hosted demo for you. I’m going to be clicking through showing Telegraf config that relates to an IoT use case and some Grafana dashboarding, as well as some 3.0 specific functionality. So let’s get cooking.
Ben Corbett: 00:03:56.437 So what is InfluxDB? So at its core, as I’m sure a lot of you know, is a time series database. This basically means it’s a database which is extremely performant at accepting high rate volumes, storing it efficiently, and serving it really efficiently. So that means lots and lots of time series, like billions of points per second, and we’re really performant at handling those kind of time series specific queries. What else is it? We’ve got our API and scripting languages, so how you interact with your InfluxDB, InfluxQL, SQL, how you can kind of integrate with it and build solutions around it. Also, the API, this isn’t just how people interact with their InfluxDB, but it’s also how they manage it as well, right? So we’ve got that best-in-class API level, as well as the kind of client libraries, in all of the major languages, which you can use. Data collectors, of course, what we want to do is make it really easy for people to stand up InfluxDB and use it. So we have a big focus on developer productivity, time to awesome, ease of use, whatever you want to call it. This kind of comes for me in the form of open-source data collection agent, plugin-driven data collection algorithm called Telegraf. Also, the client libraries as well that you can use to kind of quickly grab data and put it into your InfluxDB.
Ben Corbett: 00:05:16.771 So 3.0. 3.0, I guess, without sugarcoating it, is a complete rewrite of the underlying storage engine of InfluxDB. It’s been redeveloped using open-source protocols and standards like Apache Arrow, Rust, Parquet, and kind of the key problems we wanted to solve in the industry related to things that we just heard from our customers over and over again. So this has actually been kind of finished since way before I joined the company. But as I’m sure a lot of you that work in software appreciate, it’s just as difficult to kind of productionize and get this stuff into your offering suite as much as it is to create the underlying storage engine itself. So that’s what’s been happening this year. It’s been released in cloud serverless since January, cloud dedicated, it landed in April, and then clustered will be landing late August, early September as well, which is the newer version of Enterprise.
Ben Corbett: 00:06:13.509 So why do we build 3.0? There’s a couple of different reasons. I think one of the first ones is that there’s been an exponential increase in the volume of time series data across all of our different verticals, right? So perhaps traditionally InfluxDB has been a really good fit for metrics, streams of values over fixed or relatively slow growth sets of series. However, we really struggle to tackle those high cardinality use cases. So events, logs, traces, anything where the tag set was essentially unbounded, cardinality was the treacle that slowed the internal operations of your database down, right? So we never wanted it to be an issue for our customers, but it was really those unbounded cardinality use cases. So now we wanted to add support for unlimited cardinality so we can position InfluxDB again as the time series database of choice for all of those different time series data workloads, not just metrics.
Ben Corbett: 00:07:14.752 Yeah, the one in the middle, I mean, database customers are never going to be upset by faster queries and better compression. Let’s double down on core database components, storage, right performance, query performance, and let’s give database customers what they really want. So there’s been a really big focus in memory caching, vectorized execution, pushdowns, and focusing on those kind of more analytical queries. So hopefully we can give you a better performance. The one on the top right here is one that I’m particularly passionate about. It’s one that I’ve heard from customers time and time again. So customers always said to me, “InfluxDB is fantastic, but SSDs are expensive. It’s a hot access tier. When are we going to have cold storage? I’m sick of building my own archiving solution and pushing data out when I still get value of keeping it within InfluxDB.” So I’m really proud to announce that we’re going to be — we’re introducing a low-cost object storage tier, which is going to represent our cold storage. So not only coming with advanced compression, but also a more attractive unit cost for that cold storage tier. So no longer will customers be forced to move data out of InfluxDB and leverage quite short retention policies purely for commercial reasons. You can kind of keep it where you get value out of the solution.
Ben Corbett: 00:08:33.852 The one on the bottom left here, again, customers raise time and time again, it’s the de facto query language of choice. It has a really low barrier to adoption and a really low kind of learning curve for customers. “When are we going to have SQL support?” So yeah, again, proud to announce with 3.0, we’re going to be having Native SQL support as one of our query languages and InfluxQL is also going to be at the forefront of a lot of those performance benefits as well. So those are kind of the two query languages which will be coming forward in 3.0. And last but not least, I know this one’s a little bit cryptic, but essentially, there’s been an ethos switch away from kind of that — I guess, we’ve built some of our own protocols, less of a walled garden approach. So for example, TSM being the underlying file format, we’re moving more towards Parquet now, and also integrations with SQL-based tools now that we’ve got Native SQL, which is going to be supported. So that’s ODBC bridges, JDBC bridges, integrations with the likes of Power BI, Tableau, Superset. We’re trying to play much more nicely with the ecosystem so that InfluxDB can be a whole lot more useful throughout your tech stack.
Ben Corbett: 00:09:42.014 The last thing I’m going to be sharing with 3.0 is this diagram. There’s two things I really want to highlight here. Number one, it’s how Data Fusion, Arrow, and Parquet, being the kind of technologies or protocols, interact across the kind of hot and cold storage tier. So as our queries kind of come in, you’ve got Data Fusion, which is the orchestrator which handles where your data resides, and it’s going to be kind of executing all of those queries. Apache Arrow is the in-memory format of Parquet. So really, really performant for working with Parquet data, and basically, this is going to — the Parquet data that sits in cold storage is going to be pulled into memory to be worked on and data will reside here in the leading edge while it is written to or worked on for a time which can be configured by you guys. So that’s kind of how the hot tier works. So really, really performant hot tier there, and then data is obviously kind of graduated into the cold tier in that Parquet file format on disk.
Ben Corbett: 00:10:43.777 The second thing I wanted you to take away from this diagram is this back door that we have here. So, again, InfluxDB has been a really good fit for a very long time for those, kind of the leading edge of data or typical time series queries, aggregates, things like that. Where it’s not been a good fit — is for customers that want to run very intensive computationally expensive machine learning workloads, which operate on a vast set of highly granular data. What that is — is kind of more akin to a data export. So we found a lot of our customers were kind of building their own cold storage with files, and that was what they used to feed their data scientists. And we all know that you need to keep your data scientists topped up with Parquet files. So now we’re going to be giving those guys read-only access to those Parquet files through this back door. It’s going to be coming in a couple of forms. This doesn’t actually exist in 3.0 yet, but by the end of the year, we’re looking to have the first version of this available for you guys, so allowing you to kind of execute those really expensive workloads, read-only directly on object store, and it won’t impact the resource utilization of your kind of typical time series database queries that come in through this front door.
Ben Corbett: 00:11:59.403 Yeah. So obviously, many of us work across multiple environments. One of the key aspects of InfluxDB and why joined them is that we do have something that fits everyone. We have our industrial IoT customers that sometimes have air-gapped environments that need a self-hosted edition that’s enterprise or soon to be clustered. We’ve got cloud that comes in two iterations, cloud dedicated, which has more of the enterprise features, and obviously, cloud serverless, which is more for the agility and consumption-based pricing. And obviously, open-source for people who have single node deployment requirements, Edge, and local development. So one API that sits across all of them — we really believe that you can leverage InfluxDB and hybrid deployment models so you can get even more use out of it. So I wanted to just nod to that. So specifically, industrial IoT, how do we fit in this place? So a lot of our customers for a while were standing up InfluxDB, just kind of monitoring and diagnostics, alerting use cases. They would spin up some dashboards and it would sit there. But now we’re looking to kind of promote InfluxDB as being that kind of centralized solution that maybe you can host for a lot of your different sites, maybe even in the cloud, which can reduce the time to insights across your organization, promote agility, and allow you to build lots of services over this hub which collates all of your different data.
Ben Corbett: 00:13:21.891 So the idea being here is, as we kind of improve that time to insights, we should be shifting less from kind of reactive monitoring and more towards proactive monitoring. So getting into the likes of predictive maintenance with our support for those kind of more analytical features coming up is kind of some of the things that we’re really focusing on in the industrial IoT space. This is just a really quick reference architecture. I think it’s kind of nice because it’s got a lot of information on it. Not always good when you’re sharing your screen and for people to digest it, but what I wanted you to take away here is that we have a lot of things built within InfluxDB which specifically tackle this vertical. So IoT-specific Telegraf plugins, I think, OPCUA, and MQTT are probably some of our favorites. So we can integrate with your SCADA service. We can integrate with your HiveMQ, Modbus as well as a protocol. So whichever it is, let’s have a conversation and let’s see how we can leverage Telegraf to integrate with your data service, or maybe how it can be used in your ingestion architecture.
Ben Corbett: 00:14:30.398 If that doesn’t cut it, we’ve obviously got a whole host of client libraries. This isn’t just the complete list in all of the major languages. So if you leverage these, these are really neat because they have a lot of the right optimizations with InfluxDB out of the box, things like batching, ordering, all that kind of stuff, and they kind of create the line protocol for you. So this is a really nice way to kind of flexibly develop your own client. And then, of course, last but not least, we have a lot of our partner solutions. So PTC ThingWorx notably, Kepware. A lot of the major players in the industrial IoT space will have an InfluxDB connector. So make sure you check out their docs and see if they can make it really easy for you to just put data in InfluxDB in a couple of clicks. And this is really a nod to a feature that, again, I’m really passionate about. I had lots of customers in the industrial IoT space that had two requirements, one was a local storage requirement. I need to keep the plant operators happy, and I need that on-prem redundancy, that local copy, but also a lot of my mobile or maybe fixed assets have intermittent connectivity. So how can I produce an architecture which is both durable and gives me that Edge storage capability? And this is one of the features we’ve got to allow customers to make it really quickly for you to have this Edge and hub architecture that we like to talk about.
Ben Corbett: 00:15:55.391 So EDR is essentially a feature that’s built within InfluxDB open-source, and what it is, is a couple of commands, and you hook it up to either Enterprise or cloud and it durably syncs all of the rights that land in your InfluxDB Edge into a specified bucket in your hub instance. The idea being here is if one of those replication jobs fails, the data will be locally buffered on disk in your Edge InfluxDB and it will just keep retrying it, an interval determined by you, for when connectivity is reestablished. So really popular with a lot of our mining customers, marine customers. Scenarios where connectivity can’t always be guaranteed or maybe bandwidth is restricted, this seems to be a really cool feature. Jay, who spoke at the beginning of this call actually created a really cool killer coder course that you guys can go on and click through. So the link’s here, we’ll make sure to send this deck out at the end of it and you guys can kind of go through and play around with EDR.
Ben Corbett: 00:16:55.070 So last thing I’m going to be talking about before we jump into the demo is Teréga. So Teréga is actually one of our customers, and Teréga Solutions are the creators of digital solutions to basically improve energy efficiencies and to address decarbonization challenges. They are the creators, with us, of IO-Base. So IO-Base is a cloud-based historian or a cloud-based digital twin. And essentially, what they’re doing is allowing their customers to collect data from all production sites and view it in real time from anywhere at any time, and basically allowing their customers to be a lot more agile, right, improving industrial performance and increasing profitability. Their first customer was Teréga, who has a network of over 5000 kilometers of gas pipelines in France, and Teréga aimed to help France attain their carbon neutrality. So they had this really, really impressive goal in mind to kind of move towards the kind of green energy transition, and they were going to do it through this kind of cloud-based agility.
Ben Corbett: 00:17:56.337 So yeah, this is kind of the problem statement. They had industrial sites on the left. This is kind of the space where you have your devices, machines, processed. These are the things that need to be optimized. However, they’re typically locked behind in the IoT network, not just kind of firewalls, but also physical walls sometimes, and it’s really, really important to not have network connectivity coming back into those networks. And on the right hand side, we can offer single pane of glass access for reporting analytics, an environment for fast innovation, development, that agility that I’m talking about, and just the kind of ecosystem where you have access to the best in class solutions and services. And then in the middle, we obviously have a highly complex network of legacy infrastructure and networks. As I mentioned that kind of firewall challenge. So here’s what they saw as being the solution. I think one of the particular things that a lot of the customers were doing was data cherry picking, they called it.
Ben Corbett: 00:18:54.410 So we have a local data historian at each site, and then we have all of these firewalls. So quite complex, like outbound, sometimes inbound networking, which causes a security issue, of course, is a complete no-go for some customers, bandwidth challenges, but the main issue that we saw is that it was a maintenance thing, right? You’ve got lots of different data silos, data scattering, and it just becomes really difficult to maintain. Their solution for this was the Indabox and IO-Base. So the Indabox is essentially a physical device which is a data diode. So it is a unidirectional flow of data, and it’s installed kind of within the plant, and it’s designed to be hooked up to a variety of industrial protocols, SCADA systems. But the key here is that there’s no local software to install, no firewall exceptions and no local IoT, and they’re really, really passionate about letting customers get their hands on Indabox and kind of trialing them out for security. And this is kind of a full move to cloud, so full cloud-based data historian of which InfluxDB powers and underpins.
Ben Corbett: 00:20:01.871 So now you’ve got that agility, so data access simplified, and you’ve got your operations. So you’re monitoring, on-call, analytics, everything within the cloud, and this is kind of the problem that they wanted to solve via IO-Base. Really important to call out that — if I just go to the next slide, IO-Base is the cloud-based historian and Indabox is the device. You don’t need both of them. You can just go for one over the other. If you just have that local security issue and you want that outbound communication, you can go with Indabox, and if you just want the cloud-based historian, IO-Base is the way to go, and it does have an API suite that you can use to integrate with. So this is a little bit about how InfluxDB acts to support that cloud-based historian service and a little bit about the Indabox. So that was kind of my one case study that I wanted to go through. As I said, Teréga did do a webinar with us earlier on in the week. So if you want to check that out and have a deep dive with some of the guys over there, check out the link in the chat.
Ben Corbett: 00:20:59.065 So without further ado, the demo. What am I going to be showing you today? So the goal of this demo is to provide an overview of how Telegraf, Cloud Dedicated, and Grafana can be used to monitor three machines on the factory floor. So I’ve got a little bit of an application, which I can use to simulate a fault, so I can show you how some of the values will change, which just allows me to show you how the dashboard is changing based on those values. But really, in terms of InfluxDB functionality, what I’m showing you here is a Telegraf plugin with MQTT, and that is subscribing to a locally hosted broker. And then I’m going to be showing you a dashboard, which is purely based on our Native SQL functionality in 3.0, and then InfluxQL as well. So a little bit about the diagram here. So the Edge is just my laptop [laughter] or KillerCoda, I should say. So we’ve got Mosquitto, Telegraf locally hosted, and a data generator, and then that’s going into a version of our cloud dedicated solution, and I’ve got a locally hosted Grafana, which is there as well.
Ben Corbett: 00:22:02.425 So let me hit escape and show you what I’ve got. So this is KillerCoda. This is a really nice way, if you haven’t seen it before, for us to be able to deliver you guys with demos and training. So what you can do is to create a free account and you can kind of click through, and it has a lot of the code you need to run, and you can kind of just go in, copy in your environment variables of your own specific cloud serverless edition, and even click here and it will run all the commands for you. We basically walk you through the setup of the demo, and it’s a really nice way to just get started. What you can do is take a look at this code and you can retrofit it based on what dashboards you want to see, what’s important to you, or maybe even your own MQTT broker, right? So I will caveat this and say this is a very kind of run-of-the-mill IoT demonstration, but the idea is for us to kind of show you some of the functionality and give you the capabilities to run it yourself, and hopefully we can just speed up your development if you are looking to do a POC or a quick demo with InfluxDB soon.
Ben Corbett: 00:23:06.351 So yeah, it starts off with creating your own InfluxDB cloud serverless account. This demo is designed to work with version 3, so make sure you pick a version 3 compatible region. You need to create a bucket called factory and then just generate an API token within that account, and this walks you through how to do that. Once you’ve got that API token, we need to adjust the environment variables. That is just in the environment file of the Arrow task engine here, as we’ve listed. I’ve already dropped mine in for my cloud dedicated instance, in this case. And then once that’s done, you just hit, “Execute,” hit this command here, and it will spin up all of the locally hosted stuff and also the data generator as well. So I’ve got that running in the background. One thing I wanted to show you before I jump into the Grafana dashboard is just the Telegraf config. So you can see here — for those of you who aren’t familiar with Telegraf, Telegraf is our open-source data collection agent. It is plugin-driven, so the idea is, it’s no code, and you can just go online, pull off the particular plugin you want. And you can configure a lot of aspects about Telegraf itself. These are things like how often it polls or it subscribes or scrapes, the interval in which it’s going to write to InfluxDB. You can play around with batches, jitter, and the flush interval of the buffer and things like that.
Ben Corbett: 00:24:25.449 If I scroll down, you can see I’ve got one output plugin and one input plugin. Here, the output plugin is just InfluxDB version 2. Really important thing to call out here, the InfluxDB version 3 is fully backwards compatible with the version 1 and version 2, right, API. So whether you’re coming from version one 1.x land or version 2 land, and you want to adopt and point your right pipeline towards version 3, these plugins will work out the box. So just check out the documentation on that. And then, of course, the MQTT output plugin, it’s really just as simple as subscribing to the particular broker, and then put in your topic. And we’ve got a little bit of JSON passing here. So this is an example of how you can pass based on the JSON payload or even the topics as well. We get a lot of topic passing. So take a look at some of our documentation. I think we’ve even got some examples in our GitHub repo of how you can leverage that.
Ben Corbett: 00:25:18.553 So onto the demo. So this is my locally hosted Grafana. So a couple of things I’m just going to show you here. I’m going to start off with the data sources. So what I’ve done is integrated with my InfluxDB version 3 in two ways. One is via the Flight SQL driver. So as I mentioned before, InfluxDB version 3 now supports Native SQL, and Flight SQL is kind of what we’re promoting, is our recommended way in which to query this. So it’s no longer the InfluxDB connection with a SQL query language. It is the kind of Native Flight SQL driver. So you can just set that up and you can configure the aspects of that particular connection. And then secondly, we’ve got the InfluxDB version 1 connector which some of you might already use, know and love. So here you can specify the particular query language. We’ve got InfluxDB QL — or InfluxQL, sorry. And then one of my dashboards is based on that. So I’m just going to flick over to the dashboard now. So this is one of the dashboards I’ve got set up. I’m just going to play around. This is my application, which allows me to simulate a fault. So I’m just going to change one of them. We should see some values updating.
Ben Corbett: 00:26:25.953 So here is a kind of typical industrial IoT dashboard. This is spanning my whole site. I’ve got a particular one here, which is Machine Vibration, and I’m showing all of my machines on one plot so I can kind of analyze them with respect to each other, grouped by different machines. So that’s Machine Vibration. We’ve got load across the different machines on the right hand side. So you can see now how as I switched and put a fault on machine two, we could see the load kind of going into the red zone on machine two, and the health score has kind of dropped as well. So if I flick that off and put on one, we should see that temperature load and health score change. Now I’ve got the health timeline. So what I’ve done is just add in particular thresholds around temperature, load, and vibration, and determine what health is. And what I’ll do is just jump in here now and show an example of one of the SQL queries that I’ve got. So I’m going to right click, “Explore,” and now this has taken me to the data explorer within Grafana, where you can see MySQL query. So we’ve got lots of examples and documentation online, but to be honest, I know, I’d be joking if I said that when InfluxDB said it was going to support Native SQL, I just got really good friends with ChatGPT. If you’re stuck with SQL, feel free to ask it a question, even let it know specifically the time series requirements or that you’re using InfluxDB 3.0, and I think you’ll get to the query that you want quite quickly.
Ben Corbett: 00:27:50.257 So here I’m just having a look at temperature, assigning a value between a threshold. Again, vibration, power, and then I’m determining a particular true or false health score there. And you can see as I scroll down this, we’ve got true or false across machine one, two, and then three. So that’s just an example of how you can leverage the new kind of Native SQL ability to be able to produce some really nice dashboards in Grafana. And then obviously, not forgetting InfluxQL. So this dashboard is just a little bit different. So if I change the time range to 15 minutes or even 5 minutes, and take some of these off — this is drilling down into one particular machine. So you can see I’ve got machine two here, machine three. So this is a good example of using leveraging dashboard variables to be able to pump into your dashboard, so you can kind of do drill downs on particular machines. And here I’m just showing the raw vibration. We’ve got the log10 representation of that, because obviously, as I produce a fault, it’s kind of skyrocketing load again for that particular machine and temperature.
Ben Corbett: 00:28:59.364 So if I dig into one of these, right click, “Explore,” we should see the underlying InfluxQL query that we’ve got there. So this is an example of InfluxQL. So just selecting the temperature from that particular measurement, passing the time filter in as a variable and machine ID as well. So yeah, that’s just a really neat example of how you can leverage InfluxQL using the connector that I showed you in the beginning. So I think that’s kind of everything when it comes to the demo that I wanted to show you. So that is Native SQL Connector, InfluxQL Connector, an example of how a locally hosted Grafana can integrate with InfluxDB, and how you can kind of create some dashboards, use variables, and, yeah, hopefully create some nice dashboards, which again show things which are variable to your use case. As I say before, this is kind of a very generalized industrial IoT example, but if at any point you want to get some help with regards to how you should be storing your data, your schema, or anything like that, please get in touch. We’d love to help you out with that.
Ben Corbett: 00:30:09.726 So I’m going to flip back over to this. So just last but not least, a couple of resources for you. We’ve got InfluxDB version 3 information here. As I said, we’ll send this deck out and you guys can have a click around and see some of the cool features and capabilities that you’ll be able to take advantage of. If you want a proof of concept, don’t take our word for it, let us show you what InfluxDB can do. I run these for customers all the time and I’ve got variety of InfluxDB 3.0 POCs running at the minute across all of our different offerings. The demo link is the demo that I did today. The KillerCoda link, you can go through and click through that. And then, of course, if you want any help or you want any more information around how you can get your hands on InfluxDB version 3, please get in contact with us using this email here. So, Jay, feel free to berate me now if I’ve forgotten anything from your fantastic demo, but. And if not, I’ll hand back over to you.
Jay Clifford: 00:31:11.630 No, that was fantastic, Ben. Thank you so much. So we know that we’ve hit the half an hour mark, but we have so many questions in, actually, Ben. So for everyone, if you’d like to stick around for a little longer, we’ll try and get through as many as we can. And then if we can’t get to your questions now, we’ll send them in a follow-up email after the webinar. So Ben will be emailing you guys out and you can answer him. Feel free to reply and answer questions there. So I’m going to go through these in no particular order, because we have some longer questions and some short questions. So we’ll quick-fire them. So, Ben, for InfluxDB 3.0, is there a C# client library?
Ben Corbett: 00:31:49.352 Yes. So I think we had three client libraries that landed immediately, and then we’ve got all of the rest of them, including C#, which is imminent. I’ve played around with it. So yeah, it will be landing. And the cool thing about our client libraries is that if you want to leverage them to be able to — a query SQL, they will be using a Flight SQL driver, which is going to be really, really performant, rather than just coming in through the API. So we do recommend if you guys are looking to adopt 3.0. Although we do have backwards compatibility for InfluxQL. If you want to go for SQL, leverage those new client libraries. Yeah. So I think what we can do maybe is find out the specific date for that, Jay, and provide it in the email follow-up. But I know that it’s imminent because I have seen it and used it. [laughter]
Jay Clifford: 00:32:37.163 Absolutely. Sweet. I’ll click there. So a good one, actually, this is a Telegraf based one, so a more related IoT topic. And, Lucas, I’ll get to your first question after because it’s more InfluxDB 3.0 specific. But if you’re using HTTP listener to receive messages from your IoT sensors, if you want to scale Telegraf to handle many, many devices, would you just use a HTTP load balancer in front of it to work? If the output is to InfluxDB, would it be a problem having many Telegraf clients connecting to the same DB?
Ben Corbett: 00:33:13.505 Concurrent rights are a really, really good way to improve the right efficiency to your InfluxDB. I wish I could show you another diagram. So sometimes customers will have locally hosted Telegrafs on gateways or devices, and then those Telegrafs will be writing to a single Telegraf, which will be managing the connection to your InfluxDB. So there’s a couple of different ways around it, but just try and think about what’s scalable for your solution and how you’re going to maintain all of these configs in an automatic way, right? So there’s no problem when it comes to concurrent connections, but we also really do recommend that InfluxDB accepts and batches, make it really performant as well. So if each of those streams is going to be producing a small batch, it might be a good idea to have a Telegraf, which sits in front of all those connections, and kind of collates the data from all of them, creates a batch, and writes it. So you want to try and be getting batches in the realm of about 5000 lines of line protocol. So if you’re well below that, maybe start to kind of do a little bit of that Telegraf architecture that I just described.
Jay Clifford: 00:34:21.622 Fantastic. So I’m just going to take a quick fire one from the channel, then I’ll go back to the Q&A. So is OPC UA or any other industrial standard compatible with InfluxDB? And I believe you touched on this earlier with Telegraf. So do you want to just quickly reiterate what we’ve got?
Ben Corbett: 00:34:39.034 Absolutely. Let me go back to the particular slide. So we’ve got a variety of Telegraf plugins which are compatible with the sort of industrial IoT space. OPC UA is one of our most popular ones, yes. So take a look at the Telegraf docs, and I think we’ve got some examples in there. If not, please reach out to us. I’ve got a variety of Telegraf configs from my own proof of concepts which I keep just to help customers get over that initial hurdle. But yes, essentially, the OPC UA connection is a Telegraf thing. So that comes in to Telegraf, and the output plugin from Telegraf can be InfluxDB version 1 or 2, and that can write to InfluxDB version 3 that is fully compatible with those Telegraf output plugins. So I guess the headline here is that InfluxDB version 3 doesn’t know that it’s OPC UA, that’s the Telegraf thing.
Jay Clifford: 00:35:36.298 Sweet. I’m just going to take two questions off your hands to give you a quick breather. So we’ve got a specific one here about 3.0 and the Apache ecosystem. This is me as a big fan boy, so I thought I’d answer this one. “Even though it’s not IoT specific, more 3.0 specific, I read that Ballista —” so this is Apache Ballista, “could be used to spread queries across the cluster. Would you be able to evaluate a bit more on that? I understand that Ballista is something that augments data fusion scheduling and optimization of SQL queries spread over the cluster, but I fail to understand how this will work with InfluxDB 3.0.” So just to give a little bit of context, Apache Ballista, you can imagine it like Apache Data Spark — sorry, Apache Spark. It’s essentially a processing pipeline, so you can bring in queries, perform augmentations in parallel within a cluster on that data.
Jay Clifford: 00:36:32.876 So currently, there’s two ways of interacting with Apache Ballista, with 3.0. We actually now have the Flight JDBC driver, so you can interact directly with Ballista through that method. So you can query InfluxDB directly through that method. The other way is they actually have a Python client, a bit like how it works with Apache Spark as well. So you can just use our Python client library. Since we basically export, we expose the Flight reader with the Python client. You can just use that to bring in the Apache Arrow tables and work through Apache Ballista from there. So there’s two main methods of using that platform with 3.0. And then the other one, I will quickly answer. That one is done. “Will the columnar approach eventually make its way into OSS version, or is the OSS, actually, should fit through the prior bolt engine?” So to answer that question, yes, we are in the process of formulating 3.0 for OSS. You’ll hopefully hear about that at some point soon. But yes, this technology will come to OSS as well, which is great news. So there we go. Right. Back to you, Ben. I’m just going to answer this one. But this one is live and done. Okay. Here’s an interesting one for you. “How would you store metadata e.g. units and influx? Sensor is measuring temperature, other sensor is measuring weight. The first has degrees as a unit and the second has kg. What would you do about that?”
Ben Corbett: 00:38:17.097 Yeah. So there’s a couple of different ways to do it. Obviously, tags have been the way that metadata is always supposed to be stored within InfluxDB. So for those of you who don’t know, line protocol is what lands on the API, and line protocol is made up of a measurement name, a tag set, which is essentially what you want to query by or group by, and then you have your fields, which is the actual data. So there’s two ways around it, either you just keep your identity information as the tag set, so you keep your tags very lean, and you have a separate master data store. So you can kind of — in your backend or at the time of query, you compare your tag set, your device ID, or anything like that with the specific information which is stored in your metadata store, or to be honest, adding in the units as a field or a tag won’t have an impact on cardinality and version 2 functionality, because there should be a one-to-one relationship between a specific field. So yeah, there’s a couple of different ways around it, but I’ve got lots of customers that do either. It all kind of depends on how you want to visualize it in query, right? So that can be storing it as a field alongside your fields or storing it in the tag as metadata.
Jay Clifford: 00:39:31.958 Fantastic. Thanks for that one, Ben. And I’ll do a quick fire one for you. Will Azure support InfluxDB 3.0?
Ben Corbett: 00:39:43.513 Yeah. So really passionate because obviously I work in the European region and Azure is very, very popular within Europe, particularly our German region. So our German customers are very excited to know that Azure — so it’s going to be landing in InfluxDB Cloud Dedicated. I think we’re looking at roughly this quarter. For Azure and then GCP, hopefully, by the end of the year, but also Clustered, which is the next edition of Enterprise will be designed to be deployed anywhere. So in any of the major cloud providers, bare metal servers, all of that. So that’s kind of the plan when it comes to Azure. InfluxDB cloud serverless will only be available on AWS, I believe. So, I guess, unfortunately, for our customers that do really want to go for Azure, we’ll be looking to move you over onto either InfluxDB Clustered 3.0 or Cloud Dedicated.
Ben Corbett: 00:40:40.575 But one of the key things here that I always do dig into with customers is that the reason a lot of customers want to have it deployed in Azure is often for commercial reasons to avoid those kind of egress costs, and every single time I go through and calculate these egress costs with customers, it’s always negligible in comparison to the cost of the rest of their infrastructure. So I’d really encourage you to work out the amount of gigabytes that you would be pushing out into an AWS solution and calculate kind of what that cost might be, but also I know a lot of our customers want Azure marketplace because they just have loads of Azure credits. So we can do things like private marketplace offers for customers on Azure for InfluxDB Cloud Dedicated when it lands at the end of this quarter, too. So if you do have a really strong Azure requirement, depending on what it is, we do have routes forward for you. So please get in touch. Hopefully, that —
Jay Clifford: 00:41:36.844 [crosstalk] —
Ben Corbett: 00:41:37.153 — answers the question. [laughter]
Jay Clifford: 00:41:39.419 Sorry, that was my bad. I think I froze there. So that was me. Sweet. So just a quick one. This is a new one for me, [inaudible], just to see if you know it, Ben. So Jacob is asking, “Will a driver for JasperReports be made available for InfluxDB?”
Ben Corbett: 00:41:57.967 I’m Googling. [laughter] I actually don’t know. I’m not familiar with that one at the moment. So maybe we’ll have to get back to Jacob on that.
Jay Clifford: 00:42:05.365 I must admit, I’m the same. Jacob, we’ll definitely get back to you afterwards on that one. It’s always good to learn about new platforms and technology that we can integrate. So we can always get back to you by email on that one.
Ben Corbett: 00:42:15.223 What I’d be looking for there is JasperReports leverage like a JDBC or an ODBC driver, or maybe even like a Java client or something like that. So we should dig into that one.
Jay Clifford: 00:42:27.215 Reach out. Right. So, again, I’ll chuck this one out there really quickly. This one’s not 3.0 related. This one’s enterprise related, I believe so, just to — “I was going through the documentation of InfluxDB version 1.10 and it says, ‘Deletes and updates are restricted for better performance in an environment application team, running a few delete statements in the back end, which is affecting the performance of the server application.’ When we know that deletes and updates will cause issue, why can’t we deprecate them? Please correct me if I’m wrong as I’m trying to understand more about it.”
Ben Corbett: 00:43:05.572 So that question is around the performance of the delete operation with an InfluxDB. Is that right, Jay?
Jay Clifford: 00:43:13.450 Correct. Yes.
Ben Corbett: 00:43:13.759 Yeah. Yeah. Yeah. So I think as it stands at the moment, we are trying to build in some drop commands and delete commands to 3.0. They don’t exist at the moment. But yeah, they are, as you know, very computationally expensive. We do really encourage customers to think about time series in a slightly different way and leverage things like retention policies. But obviously there are situations where deleting data within a measurement or within a particular database, not just dropping a whole database, is necessary. So we are going to have capabilities to do it just for those customers where it is kind of like a must-have. But we strongly encourage you to think about and design your time series data in a way that you leverage retention policies to kind of evict data, which is old rather than have business as usual, delete operations. That’s, yeah, not typically a kind of transactional database. So it is computationally expensive.
Jay Clifford: 00:44:12.289 Awesome. I’m just going to — I’m just going to bundle a few questions together and quickly answer it, just because we’re working on this internally. So we’ve had a few questions about whether we’re going to be sort of integrating machine learning platforms in with InfluxDB 3.0, and just to confirm this in a deliberate yes. So we actually have a research paper and a how-to guide that will be released soon on some of the platforms, that you can use to integrate machine learning directly with 3.0. One of the platforms I highly recommend you check out is Quicks.io, that’s built with InfluxDB under the hood. Amazing group of people there that can provide basically stream machine learning but also allow you to do sort of basically batch machine learning on your data within InfluxDB as well. So we have some material coming out on that very, very soon and much more. So definitely lots more machine learning abilities coming. The demo in that situation is actually based on this example that you have here. It provides anomaly detection on the vibration data and also forecasts expected vibration and load levels based on your data as well. So stay tuned for that coming.
Jay Clifford: 00:45:22.440 So that’s that question. I think we’ll answer one more, Ben, and then we will close it out. And we’ll get to everyone else — we have so many questions. We’ll get to everyone else’s via email afterwards, so. Let me get to a good one. So just [crosstalk] —
Ben Corbett: 00:45:42.199 While you’re getting the next one out, Jay. Oh, no, go ahead. I’ll take this one at the end. I’ve just been digging into the Jacob’s requirement for JasperReporting, and it looks like you do access relational databases through JDBC driver connection. So that will again be supported through the Flight SQL bridge for JDBC.
Jay Clifford: 00:46:00.571 Fantastic. Let me just see if we’ve got a good one. Here we go. “So do we have integrations for Tableau, other BI tools? And when are these integrations available?” Great one to end on.
Ben Corbett: 00:46:14.926 Yeah, absolutely. So again, one of the really passionate things that we had about integrating Native SQL support was to be able to integrate with SQL-based tools. We had lots of customers which were — they built their whole kind of enterprises around solutions like Tableau, Power BI, Superset, and things like that. So yeah, kind of, as I mentioned earlier, we’re going to be having JDBC and ODBC bridges for Flight SQL, so we’ll be integrating with those tools. Have you played around with any of them yet? I know we’ve got Tableau set up and Superset and ,things like that, Jay. Are they fully available in 3.0 yet, or are they imminent? So Tableau is fully available, Superset’s fully available, and we’ve got Grafana there, but that’s not the sort of Power BI related. We’re working on — the Microsoft ones escape to me. [crosstalk] —
Jay Clifford: 00:47:11.990 Well, not Power BI?
Ben Corbett: 00:47:13.615 Sorry, Power BI. Apologies. So Tableau is available, Power BI is imminent. We’re just working on a few things with the guys there at Microsoft just to try and get the plugin finished. So yeah, that is imminent.
Jay Clifford: 00:47:24.931 Is that because some of those are ODBC and some of them are JDBC? Is that why we’ve done one, and then the other one’s imminent?
Ben Corbett: 00:47:31.673 Correct. Yes, yes. There’s different ways of connectivity. Yeah. [crosstalk] —
Jay Clifford: 00:47:36.489 Fantastic. This is brilliant. The amount of questions we hear is fantastic. We’d be here all day. So we will definitely answer some of those after. Well, first of all, thank you so much, Ben. You have been an absolute star, and we’re excited to have you on a few more of these, hopefully. [crosstalk] —
Ben Corbett: 00:47:51.868 Yeah, absolutely. Thanks for having me. It’s been great fun.
Jay Clifford: 00:47:54.658 So for everyone else on the call, just to round this out. So Ben will be sending an email later answering some of these questions that are left in the chat. So we will catch those for you. Feel free to answer that email and we’ll respond to all of those questions that you have in there. So that’s one method of communication. If we’re not getting to you there, then please drop into the community as well, and our lovely community members, including us, we’ll answer your questions as well. And yeah, thank you ever so much for your time. We really hope you enjoyed it. This is part of a greater series, and we’ll be looking at enterprise IoT, we’ll be looking at other demos, including sort of observability and stuff like that as well. So really excited for more to come.