How to Enable Industrial Decarbonization with Node-RED and InfluxDB
Session date: Oct 05, 2021 10:00pm (Pacific Time)
Graphite Energy’s thermal energy storage (TES) platform encourages clients to offset their traditional energy consumption with low-cost renewable energy sources. Their customers include manufacturers, mines, steelmakers and aluminum plants. IIoT data is collected about energy usage, fuel consumption, temperatures, solar panels, wind farms, process steam and air dryers. Discover how Graphite Energy uses InfluxDB to monitor their zero-emission energy solution.
In this webinar, Byron Ross will dive into:
- Graphite Energy's approach to reducing their clients' carbon footprint
- Their methodology to collecting sensor data used to make their operations more green
- Why they chose a time series database over a data historian
Watch the Webinar
Watch the webinar “How to Enable Industrial Decarbonization with Node-RED and InfluxDB” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “How to Enable Industrial Decarbonization with Node-RED and InfluxDB”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
- Caitlin Croft: Customer Marketing Manager, InfluxData
- Byron Ross: Chief Operating Officer, Graphite Energy
Caitlin Croft: 00:00:00.226 All right. I think we’ll get started here. Once again, hello, everyone, and welcome to today’s webinar. My name is Caitlin Croft. I’m very excited to have Byron Ross from Graphite Energy here to talk about how graphite energy is using InfluxDB and Node-RED to enable industrial decarbonization. So please post any questions you may have for Byron either in the Q&A or the chat. This session is being recorded. And without further ado, I’m going to hand things off to Byron.
Byron Ross: 00:00:36.612 Thank you very much, Caitlin. And thank you all for coming and watching a talk on a part of the world that we think is slightly left-field and some of the hidden infrastructure of how your food’s made in this case. Our primary mission is to enable industrial decarbonization. So we turn renewable electricity into heat, which helps process industries, food manufacturing, steel, aluminum to decarbonize and ultimately to replace fossil fuels. We take a variable renewable electricity resource, be that solar or wind energy, and we put that into an industrial process that has its own different-shaped requirements. And we do this using our thermal energy storage product. And what thermal energy storage does is it takes that input, stores it, and then allows it to be returned back into the process when the process needs it. So it decouples the variable renewable energy availability from the industrial process requirement on the other side. And this is a typical instance of this storage, a big orange box. What does it mean to the customer? It lets them meaningfully reduce their fossil fuel use, so reduce their CO2 impact. And it requires that they move the storage from the gas pipeline to their process plant to take advantage of this opportunity. As gas prices go up and renewable energy prices come down, there’s a big opportunity here for the customers to reduce their exposure to carbon and to the increase in gas prices.
Byron Ross: 00:02:23.971 So how do we do this? Well we use an old term internally, which is Industry 4.0 because we first met this in Germany many, many years ago with some of the working groups over there. And data really drives our success and our customer success. Data these days is everywhere. We get it from our machines, from our customers’ machines, we use weather data. We’ve got internal and third-party APIs. And we need to bring that all together so that we can operate the machines, we can allow remote operation and maintenance and monitoring, engineering of those machines. It helps us with our product development and value engineering activities. And it directs some of our key research tasks with our research partners. And I think InfluxDB is really helping us to get that data of vision into our business context. Back in 2007 was when we really started trying to manage our data in a more formal way. This is an installation in northern Germany. We developed the control system for the mirrors. It has 2200 independent machines that needed to be orchestrated in real-time. We used a traditional, what’s called a historian in the industrial context, which was a way to store time-series data. We didn’t have the words back then. Almost all of the analysis was batch-based whether it was online or offline. And if we compare that with where we are today, we log everything. We collect the data locally and in the cloud. We use a time-series database designed for storing this type of data. We make pretty extensive use of web dashboards. We’ve got some cloud-based analytics, some edge compute, digital twin kind of strategies that we’re working on.
Byron Ross: 00:04:22.379 And most of the analytics is done in real-time now as opposed to being batch-based processes. From a data point of view, if we look back in 2007, each of those machines was producing about 10 different types of data that we could store every minute or so. We kept those records. So you could end up with about 15,000 records per day per machine. And if we compare that with today, each machine has got more than 100 different sensors on it. We record that every 1 to 15 seconds depending on the type of the machine. So we can easily produce a million independent records every day. So we needed a time-series database. And it took us a while to find the words for this. To be a little glib, once we started dealing with this data, SQL chokes. And Excel doesn’t do time. And I mentioned Excel because we deal with a lot of mechanical and electrical and process engineers. And one of the things they do is they start modeling in Excel and other similar tools. And time in Excel is not much fun. SQL also worked well for us for a while. But we quickly got to the point where the queries were starting to really slow down. And I’ll come to that - a bit of the detail of that later. So what we needed was a database that was specialized for time-driven data series, sensor and actuator data. We are much more interested with the type of data we’re collecting to know how was the system as at a particular time rather than what belongs to what. So we’re related in time, not space. Relational databases really don’t work very well for storing this kind of information.
Byron Ross: 00:06:15.537 Another fundamental difference is that you tend to write the data once and read it many times. And you append data to a series. You don’t go back and modify a series that already exists. So when we think of
As At, I guess it’s kind of like some of the event sourcing concepts in the pure software world. We want to know what was the state of a system at a particular time. And again, I’ll come back to some of the detail about why this is important. But the regularization of that data is critical to it being able to be transferred and used in other systems. And quite clearly, it’s coming in jagged. It’s coming at different times, at different frequencies. But I’m very interested to know what did my system look like at this time. And a time-series database helps me get that. The lifetime of a series, you write the value once, you retrieve it many, many times in many different ways, and you don’t ever edit it in place or delete it. We are very driven by queries in this space. The input data, as I mentioned, is irregular. It comes in at different times from different places. But the output that we’re interested in is rectangular in time. So CSV in our world is how data is exchanged between different systems and processes. It’s very old. It’s very flaky. But it works, and it’s ubiquitous.
Byron Ross: 00:07:56.554 The important thing about having that rectangular data is that it lets us provide clocking for either a digital twin or for a lot of the third-party tooling that we use, so thermal analysis toolkits and other kinds of engineering tools driven by the clock that comes out of that rectangular data. And Influx, for us, makes that very easy to achieve. We need to have configurable time steps in our queries. We might need to look at a whole year’s worth of data and say, “What does it look like every hour?” We might need to look at the last five minutes and say, “What does it like every second?” And being able to do that easily is quite important. And on that, we need to know how we’re going to treat the data in each time step. So we have this irregular data. We want to produce the regular data. How do we want to have the system deal with the manufacturing, if you like, of the data in those time steps, min, max, mean, median? Do we want to apply a function to it? What are we going to do with missing data? When is data missing? These are all important questions to be able to answer at query time. Important for us is can my mechanical or electrical engineer drive the query language. So can they go and self-serve the data that they’re interested in at any point in time? And for that to be achieved, the query language has to be fast to learn, to write, and to execute. We run lots of queries every day. And we don’t want to have a big support team having to train the engineers in how to operate that. We’ve been doing this for a little while. And in production, we have gone through three primary systems for storing this time-series sensor information.
Byron Ross: 00:09:49.253 Back in 2007, because we didn’t know any better, we used Microsoft SQL, and we came up with our own format for dealing with that. This was sort of the industry standard at the time for strong the data. Probably about two and a half years ago, I think we moved across to Azure Time Series Insights, which, for us, was an absolute revelation. It allowed us to do what we needed to do in terms of provisioning the systems, collecting the data, and also some of the analytics and queries. And then 18 months ago, I think, in anger, probably a little bit more recently than that, we moved entirely over to InfluxDB. And we think we’ve found a home for the moment. Things we’ve tried - we’ve tried various NoSQL databases. We tried OpenTSDB. It was one of the first ones we experimented with, Graphite. We even had a little flirt with kdb, which was interesting. It was very performant. It was not particularly discoverable. So our current strategy involves using Node-RED for data collection and engineering and InfluxDB for storage. This is what our data architecture looks like today. We’ve got a local control network in the orange box. We have some data collected through our machine data interface. We have a local time-series instance. We have operator interfaces, EIP integrations, and a little bit of edge compute with some digital twin work that we’re doing. That all exists at the customer part.
Byron Ross: 00:11:40.782 And then to support the customer, we have a remote operation center, which has another Influx instance. And that’s where we’ve got dashboards and engineering support tools. And then across in the cloud is our primary data store where we do some additional computing. We access particularly third-party APIs. And we can glue that together with automation. A typical machine network looks very different certainly to when I started. Now we have the data logger, which is the hub of the data collection. It has its own network that goes and talks to a bunch of independent machine controllers. This is all in one installation, for example. And it allows us to process the data on device and ship it out to our time series store. It also lets us do this in-service digital twin, which I’ll show you only because we’re inordinately proud of it. This is something that’s enabled by the time series data. And it’s where we can have a real-time digital model that is accurate to within about 5% of what we’re seeing in service machine. And it lets us roll forward and backwards in time and is becoming a very powerful part of our predictive toolkit for production optimization. Our ecosystem around time-series data is a little bit eclectic. We use Node-RED for collection. We use a Smartsheet for configuration, InfluxDB for time-series store. We use some IEC programming languages that I won’t get into too far. They’re a little esoteric. But they’re on the control side. And the consumers in our organization are using mostly Python and C-sharp. But they can use whatever they want.
Byron Ross: 00:13:38.161 So the Node-RED workflow that we’ve developed over the last two years is based around an industrial controller manufactured by a German company called Weidmuller. And we have a good relationship with their engineering team in Germany. And this is a product that combines a real-time logic controller system with a Node-RED instance on the side, which we use for data collection. And this has allowed us to do some things that technically weren’t possible in this space until quite recently. Fundamentally, we get configuration from a Smartsheet record. We then get the data, engineer the data, and post it into the databases. Now we need to do a lot of on-device engineering because these industrial protocols we’re dealing with are old and clunky. They tend to be byte and bit-based. So there’s a lot of masking and unpacking and reordering of bytes, converting streams into floats and integers and the like. It can get pretty low-level. So when we moved to Influx, one of the things we lost from Azure and that you would also get from AWS, etc., is the ability to configure your IoT devices from the cloud. So in order to push engineering updates, it’s quite nice to be able to do that without having to promote into the device through your VPN and your low-bandwidth cellular connection, to do it through the cloud and have the device pick up the configuration when it’s available. And so we use Smartsheet for various reasons.
Byron Ross: 00:15:25.110 But what it lets us do is configure the operating environment of the system, where does the data come from, what we want to call the data series, what node IP address does it come from, what size is it, where is it in the stream that comes from these remote devices, what sort of pre-processing we want to do, what sort of post-processing, scaling, et cetera, do we want to do to this information. As I said, it comes in at a very raw byte-based format. We then want to turn it into a useful engineering unit for storage and later retrieval. And so this seems pretty straightforward. And it’s probably not that complicated. And this is what works for us, which is we make an http request to the Smartsheet API. We convert that result to the JSON. And we basically turn that into a format that lets us merge it later with the processed data. So the data engineering is the meat of what we do on the device, which is collect the data from the remote nodes that comes in as those byte streams and convert that into useful engineering information for subsequent years. We also need to merge it with the Smartsheet data to enable us to do the mapping and the bias and gain correction if you like. One thing that’s nice about Node-RED is that it uses this message-based format to transfer between nodes. And so you can build on the messages and add and remove and remix the data that’s appended there to your heart’s content. And you can keep reforming it until it looks how you need it to look.
Byron Ross: 00:19:16.098 So that allows the “add Influx http headers”, which is this second node here, to add the data it needs to the message object, and then the “send to Influx node”, which is our third node there to access from the message object itself the configuration it needs to access the database. So working in InfluxDB, this is much more enjoyable than working in some of the other languages we’ve played with, some of the other databases we’ve played with. Our main development and operation workflow is based around InfluxDB and Grafana. We use Grafana extensively for internal and customer dashboards. We use it to do some simple automation tooling. Whether the data is being transmitted, we can monitor some sensors and make decisions on that using Grafana. We also do a little bit in Flow and Python for our automation. Our digital twin is written in Python. And we use Teams and email. And we have some toolchains for batch CSV exporting. That helped the engineers to get the data they need to do the job. And when I say engineers, I’m talking about the mechanical and process engineers. We really, really like Flux. I believe this was a new query language with Influx 2. Caitlin might correct me later. But we have started to do some nice things at query time that traditionally we would have done later in the toolchain. We would have probably done that in a different way.
Byron Ross: 00:21:15.834 Because it’s a functional language, you can define functions to be used during the query process. So this thing here is calculating mass flow using the ideal gas law. The important thing is not how it works but that there’s a function defined before the query. And then in the query itself, what we do is get the rectangular data that matters to this function. And in this case, we group it by the time because we’re interested in what was the pressure, the temperature, and the flow at that particular time. And then we just order them by time. This is a simplified query. One of the learnings that we’ve had when using the web tooling, particularly, is that the graph view is nice, but the table view is useful. So very quickly, we found that we go to the table-based view to understand what data is being returned from the database and how we should then continue to process it. And this is a way for us to transform these three data series that are the primary storage data into the engineered value, which is the flow, which is what we’re actually interested in at this point. And so we use the reduce method here. And that has a nice accumulator model. And it just processes through using our calculate mass flow function and produces a new dataset which now has a field in it called in-gas mass. And then in order to use this in subsequent steps in the query, we want to then map that into something that looks like what gets returned from the raw database. And so that’s when we just map it into this time, field, value and group it back by field.
Byron Ross: 00:23:15.784 And we then have taken our three values and engineered them into one. And this is something we do a lot, I think, in Influx. And Flux makes it very easy. We haven’t been able to do this in other query languages. And that’s probably on us. But it lets us do the engineering at query time, which means that we can write some tests around it. And we can test it in other environments so that when the engineers want to come and actually use that value, we’ve got a high confidence that they’re getting what they think they’re getting. We do some Grafana alerting, which is quite a nice, simple way to access some complicated functionality. So you can see here on the left, we’ve got some alert rules. And they need to be connected with notification channels on the right. And when you make that connection, you choose a query that defines the condition that you want to alert on. And then in this case, we’re sending an email. And you can go from there. I’m not sure, I think you can hit webhooks and things as well from here. But this is something we use just to give us a simple idea of whether or not the data’s working. You can see there’s a door open alert there and a weather station power cut. So these are useful pieces of information that traditionally we would have had to get another way. So in this space, what we would have got with SMS alerting systems and things like that, we can now get them straight from our data store.
Byron Ross: 00:26:43.763 There’s also, like always, a huge amount of engineering that you have to do for variable connectivity environments. These machines and most real-world implementations of IoT have either a cellular connection or some radio connection or another unreliable internet connection. So a lot of the work we’ve had to do in Node-RED is around caching and retrying to get this data into the cloud. InfluxDB has been a journey as well. Debugging Flux is difficult once the queries reach a certain level of complexity. The web UI is great for self-serving data series as they exist. It’s not so good for when you want to do in-query engineering. Most of our team has moved across to VS Code to do the query development. There’s an Influx data Flux plugin that helps with the language. And I recommend it to look at that. One of the key outcomes for us is that queries are the razor blades of the cloud service billing. The cloud service is awesome. I can’t speak highly enough of the way they have developed it and the way they price it. But just bear in mind that the queries get expensive very quickly, especially in our case where we have a lot of dashboards looking at the data. And particularly for our lab-based work, some of it’s quite high-frequency. Updating each element on the Grafana dashboard is running its own query. And the counts add up very quickly. Keep an eye on that.
Byron Ross: 00:28:32.522 Where are we going next? We are absolutely sticking with InfluxDB for our data storage. This is enabling our success and we think enabling our customers’ success. We are looking at a scale-out over the next months in the order of 100 times more serious than we have now. Our machines are developing more sensors. We’ve got more actuators. We’re taking in information from more of the peripheral systems that we have data for. And this is something that’s so far, we have not hit any limitations with the Influx at all. We are also moving towards putting some on-device machine learning on that little industrial controller using Python. And that’s to support our digital twin and edge compute strategies. The on-premises and InfluxDB cloud instances are really going to help us achieve our goals there. Now, I believe I’ve finished a little early. So thank you very much for your attention. I look forward to the questions. And I think I’ve got a slide here that Caitlin’s making me put up, which is this one.
Caitlin Croft: 00:29:44.144 Perfect. Thank you, Byron. That was great. Yeah. So InfluxDays is coming up here in a couple weeks. So be sure to check it out. The conference itself is completely free. I know someone already asked if the sessions are being recorded. I’m assuming you are in the APAC region. Yes, they will all be recorded and will be made available on YouTube a couple days later. So if the time zones doesn’t work out for InfluxDays for wherever you are, don’t worry, they will be available for replay in a couple of days after the live event. Let’s see, InfluxDays ends on the 27th. I’m guessing by the following Monday, we should have them all up on YouTube ready for everyone. All right. So the first question for you is what PLC controller are you using?
Byron Ross: 00:30:38.957 Thank you. That’s a good question. We are using a Weidmuller PLC controller. It’s an SL2000-OLAC-EC if you must know. We use actually a sort of hybrid here of a PLC and a Node-RED data collection device at the moment. The PLC has a separate core. And we’re working towards engineering the Node-RED into that PLC. But it’s an incremental gain for us. I mean, the other PLCs that we work with, the likes of Siemens and Alan Bradley, we can’t do any on-device work with those units.
Caitlin Croft: 00:31:25.268 Cool. And you mentioned that you really like Flux. And I know that a lot of people, as they get up and running with Flux, it takes them a little bit of time. So are there any tips and tricks that you learned along the way when you - especially when you first started using it, that you wish you had known at the beginning?
Byron Ross: 00:31:42.887 I think the most important thing that I found buried somewhere in the Flux documentation was this look at the table view. Like the graph is really exciting and nice to look at. But the table view really lets you understand the structure of the data that’s being returned And I think for a lot of people coming from the development background, certainly from my background, coming from a more procedural development background, the functional mindset is quite challenging. So incremental query development using that table view so you can really understand how each query is modifying the structure of the return information is critical. Like we wouldn’t have been able to do it without that little bit of knowledge to look at the format of the data and not at the pretty graph.
Caitlin Croft: 00:32:42.518 Yep. Let’s see. Can you elaborate on the caching/storage and forward being done within Node-RED to address intermittent network connectivity? Are you using a public node or built in-house?
Byron Ross: 00:32:58.564 Sure. We are doing most of that in-house. So one of the things that’s really important for us is having a continuous data record. The actual volume of information being generated by each machine is not itself huge. So we store on the SD card. We have a rolling log. Depending on how much data is on that particular machine, the log obviously has a different length. But SD cards are pretty cheap now. And that’s our fallback data store. And we have a way to replay that in the event that the cache should fail or something should fail for a long period of time with our connection. Otherwise, the caching is done in RAM, and we store messages. It’s pretty primitive. But so far it’s been very reliable. We haven’t had to do anything too outrageous. It’s kind of a store and forward system if you like. We keep queuing the messages internally. And if we were to run out of memory, we just throw the old ones away. And we know that we still have that data on the SD card if we need it. And then when the internet connection comes back, we can play it up. The timestamp is set on-device, not on the cloud. So that’s something we actually learned early on, is make sure your timekeeping even out to the edges is pretty good, which is probably harder than it sounds in the industrial landscape.
Caitlin Croft: 00:34:37.033 Yeah, the industrial landscape always, if feel like, adds so many more complexities than people realize with old machines, different vendors, all of those different things. I think you touched on this a little bit. Can you expand a little bit more on why you chose to move away from an on-prem solution and move to a hosted cloud solution when you were looking at time series databases?
Byron Ross: 00:35:01.777 Sure. Because we don’t want to be doing IT fundamentally. So for us, the primary data store is absolutely the cloud store. The on-premises supports real-time operations and local dashboards and information. But we can always fall back to the cloud if that falls over. We’re not DBAs. We don’t want to manage that in an intensive fashion I guess. And one of the nice things, I guess, about lots of software now is that it comes as a container, and you can deploy it. And as long as you’ve got configuration stored somewhere, it’s relatively straightforward to spin it up. We’ve moved away from on-premises as being our primary store mainly for security reasons. We trust our cloud providers to be the experts at managing and backing up and restoring and making sure we have a seamless experience.
Caitlin Croft: 00:36:00.220 Great. Is there a reason why you haven’t continued to use cloud services for managing the IoT devices and secrets?
Byron Ross: 00:36:11.833 Yeah, well there’s two parts to that. One is the secrets, which is, I guess, the identity of the devices and the trust chain that goes with that. I’ll come back around to that. The reason we’ve moved away from Azure AWS for managing the devices was the complexity of the deployment experience for us was not rewarded. We have tens to hundreds of devices in the field. These systems are contemplating tens of thousands, hundreds of thousands, millions of devices. So they come at it with a different mindset and structure that doesn’t really suit our deployment strategy. We maintain remote access to our data collection and control devices. I think in a lot of IoT deployments, it’s quite difficult to get access to the device once it’s in the field. So you need a different set of strategies for how you deploy configurations and updates and firmware changes and the like. That’s not our challenge. Our challenge is what’s reliable and easy to engineer on device, on these controllers that are not at the cutting edge of capability. And then the secrets is probably either a limitation of Node-RED or our understanding or a bit of both. The secure storage of those things was not a primary consideration in the particular implementation that we’re using. And it’s something we’re working towards. It’s how we can keep securely some credentials on the device that can only be accessed at certain trusted times and places.
Byron Ross: 00:38:10.875 It’s an ongoing program to make that a better experience for us. But at the moment, we will deploy into a new device, and we will then have to manually go and populate the secrets into the configuration. And it’s about as fun as it sounds.
Caitlin Croft: 00:38:26.304 You mentioned micro-controllers. Which can you recommend for learning and using InfluxDB, Arduino, Raspberry Pi, Jetson? You mentioned Python, which libraries?
Byron Ross: 00:38:41.443 They’re good questions. So we don’t actually do Influx from the micro0controller. I’m sure you could. And I guess anything that has got the grunt to do a web-based interface is probably what you’re going to need. We probably suggest, if you’re looking at that as a startup, to run - run Node-RED on a Raspberry Pi is how we started. And then it doesn’t matter whether you’re talking to a PLC or micro-controller or something else. The data communication channel is a separate problem. In terms -
Caitlin Croft: 00:39:19.295 In my -
Byron Ross: 00:39:20.066 Yeah. Sorry.
Caitlin Croft: 00:39:20.703 Oh sorry. Go ahead.
Byron Ross: 00:39:21.804 I was going to say in terms of Python libraries, I’m sorry, I don’t have that on my fingertips.
Caitlin Croft: 00:39:28.664 Yeah. And in terms of the micro-controllers, in my experience, most community members are playing around with Raspberry Pis just because they’re so versatile. If you look online for the virtual time-series meet-ups, there’s a lot of people who have shared how they’re using Raspberry Pis at home. So if you’re interested in learning more about that, I would check out our website for that.
Byron Ross: 00:39:51.755 Yeah, but Pis are very accessible ways to do that coordination. And there’s a huge, very supportive community. Micro-controllers, if you’re talking about ARM Cortex or something, they are more difficult to use for sure.
Caitlin Croft: 00:40:12.566 Well I think that’s the rest of the questions. But if anyone has any more, we’ll stay online here for a couple of minutes. So if you guys have any more questions for Byron, please feel free to post them. I have another question for you, Byron. You talked about the different solutions that you had considered before using InfluxDB. How was adoption of InfluxDB? It sounds like you kind of tested them out, and the team seemed happiest with InfluxDB. But I’m just kind of curious. Any other insights into how adoption was?
Byron Ross: 00:40:46.377 Adoption of any new system in an organization is always challenging. People like the way that they work. It’s very hard to promote the benefits of something, especially when as we were starting our Influx journey, we had a bit of Azure under our belt. We kind of knew what we were looking for. And there’s a lot of pushback always, “Oh but we’ve just made all this investment in the Azure tooling. And we’ve learned how to use that. And you’ve spent all this time training us. And now you want us to use a new system,” especially when the benefits seem perhaps a bit amorphous at the beginning. And I think one of the things we did with Influx was we got some of the key non-software people on board pretty early and showed them the benefits of particularly that inquiry data engineering, which wasn’t - I’m sure it was possible in the of Azure queries. But it certainly wasn’t as easy. And once they saw that, they kind of just pushed the adoption by stealth, which was great for us. And then they said, “Oh why are you bothering -“ we actually ran parallel connections for quite a long time. We had our Azure and Influx infrastructure running side-by-side. And I think eight months ago, we turned off the Azure connection in terms of the ingest. Yeah, you’ve got to find your internal champions and get them to drive it.
Caitlin Croft: 00:42:17.911 That is true. Towards the end of your talk, you mentioned that in the future, where you guys are going, you’re going to be scaling out on having so much more time-series data. Have you guys started to look at downsampling and looking at changing the frequency of data collection? Because I know that when you start off with time-series data, you need it maybe every millisecond. But then once you have that initial data set, maybe you need it every second or something like that. So I was just curious if you guys are looking into that.
Byron Ross: 00:42:50.726 Yeah, absolutely. I mean, downsampling for historical data is critical. And in the old days, back in that sort of 2007 experience, you had to do it early and very aggressively. You had to really [inaudible] it for the long-term. We haven’t found the limitation yet with the amount of data we’ve got. We know it’s coming. And one of the next layers that we’re going to build out on Influx is, I guess, that configuration strategy for how long should this data persist at that frequency and then what’s the next frequency it falls back to and so on, and what’s our, I guess, at-rest data series frequency that we’re interested in. Yes.
Caitlin Croft: 00:43:40.177 What was the biggest hurdle for getting people comfortable with time series data versus SQL and others? If there was a hurdle, how did you sell the team?
Byron Ross: 00:43:53.614 Oh god, that was easy. SQL was so painful. Look, it’s great for what It’s great for. Nothing else will touch it for relational information. But for the time series, I think the thing that got an alternative over the line was the amount of time that the queries were starting to take. So as we got a lot of breadth in the data that we were stirring in SQL, queries would take a noticeable amount of time to run. You did at five, and you can make a cup of coffee. And this was like old mainframe horror days, the stories you used to hear. And we knew there was a better way. And for a long time, we didn’t have the words. And then we had the words, but we didn’t have the technology. And now we have both, which is great.
Caitlin Croft: 00:44:40.211 Awesome. So someone’s asking what the company is and how they can talk with you afterwards. So the organization is called Graphite Energy. And, Francis, if you’re interested in talking to Byron, I’m happy to connect you afterwards. You should have my email since I set up the Zoom and everything. So feel free to email me. And I’m happy to connect you directly with Byron so you guys can talk shop more over email afterwards.
Byron Ross: 00:45:07.557 For sure, always happy to chat and happy for you to do that in the first instance, Caitlin.
Caitlin Croft: 00:45:14.006 Awesome. All right. There’s another question for you. How are - or are you using AI in some way, maybe when downsampling to detect significant events and store those discarding data that has very little information in it?
Byron Ross: 00:45:29.552 This is always the challenge with what to keep because everybody wants to keep everything at infinite precision. You want to have a record of the world as it was. We’re doing a lot of ML work. I’m not quite sure where ML crosses over into AI. Call me a bit of a cynic, but AI, I think it’s banded around quite a lot. Some fairly straightforward statistical techniques will do almost all of the heavy lifting for us. We can get rid of outliers, and we can identify bad sensors pretty reliably just using normal techniques. Where it gets really interesting is some of the work that this German company we deal with has been doing around trying to find operational islands of stability if you like. So if you have a machine that has minimum and maximum values for three parameters, the machine will operate inside that window fine. But there’ll be these little islands in there where it operates better than baseline. And this is where AI is kind of helping to learn where those little islands of optimization are within your allowable operating envelope. So that’s where I think AI will come into our future. Sensors tend to fail pretty reliably badly. They go off, or the value becomes outrageous, or the data becomes intermittent. These things are not particularly difficult to find. We don’t find that we get a lot of sort of slow drift or something like that that causes problems or other things that AI may be able to identify in that space. AI definitely has its place, but not sort of, I guess, for us at the moment in data cleaning but more in trying to find out where we should be operating these machines.
Caitlin Croft: 00:47:29.424 Okay. Thank you, Byron, for answering everyone’s questions. If anyone has any last-minute questions, please feel free to post them for Byron. We’ll stay on for another minute. I really appreciate everyone staying on. I know our webinars are usually at a different time. But it seems like lots of people like joining them maybe in APAC or in the US in the afternoon. So we really appreciate it, love all the questions. I hope to see everyone at InfluxDays in a couple weeks. So it’ll be a really fun event.
Byron Ross: 00:48:04.759 Thank you, Caitlin. And thank you guys for your attention and your questions, and sorry for my rambling.
Caitlin Croft: 00:48:10.784 Hey, I thought your rambling was interesting. So it’s not rambling.
Byron Ross: 00:48:14.201 Oh it’s discursive.
Caitlin Croft: 00:48:17.532 Fantastic. Well, thanks all of you for joining today. And I hope you have a good day. Thank you again, Byron.
Byron Ross: 00:48:25.018 Thank you, Caitlin. Thanks everyone.
Chief Operating Officer, Graphite Energy
Byron is the Chief Operating Officer of Graphite Energy and is responsible for overall technology development. He has delivered bespoke industrial technology across Oceania, Europe and China over the past 20 years. Byron successfully grew an industrial technology company and sold it to a customer in 2015. He has been responsible for developing industrial solutions from the microcontroller to the cloud covering hardware, firmware and software.Most recently he has been involved in developing solutions that enable widespread and rapid decarbonization of industrial heat, by taking renewable electricity when it is available and converting it into heat for when it's needed.