How Crosser Built a Modern Industrial Data Historian with InfluxDB and Grafana

Session Date: May 05, 2021
Time: 8:00am (PT) | 3:00pm (GMT)

Crosser are the creators of Crosser Node, a streaming analytics platform. This real-time analytics engine is installed at the edge and pulls data from any sensor, PLC, DCS, MES, SCADA system or historian. Their drag-and-drop tool enables Industry 4.0 data collection and integration. Discover how Crosser’s easy-to-use IIoT monitoring platform empowers non-developers to connect IIoT machine and sensor data with cloud services.

In this webinar, Dr. Göran Appelquist will dive into:

Crosser's approach to enabling better IIoT data analysis and anomaly detection
Their methodology to equipping their clients with ML models by supporting all Python-based frameworks
How Crosser uses InfluxDB time series platform for storage

Click here for presentation

Watch the Webinar

Watch the webinar “How Crosser Built a Modern Industrial Data Historian with InfluxDB and Grafana” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.

[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]

Here is an unedited transcript of the webinar “How Crosser Built a Modern Industrial Data Historian with InfluxDB and Grafana”. This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.

Speakers:

Caitlin Croft: Customer Marketing Manager, InfluxData
Dr. Göran Appelquist: CTO, Crosser

Caitlin Croft: 00:02 Hello, everyone, and welcome to today’s webinar. My name is Caitlin Croft. I’m very excited to be joined by Göran, who’s the CTO at Crosser. So he will be going into how to create your own historian using InfluxDB, Crosser, and Grafana. Once again, this webinar is being recorded and the recording and the slides will be made available later today. And without further ado, I’m going to hand myself to Göran.

Dr. Göran Appelquist: 00:33 Thank you, Caitlin, and welcome everyone to today’s webinar. So I’m working for the company Crosser, which is based out of Sweden. So we are a software company founded five years ago. We have two offices in Sweden, and I’m based out of the Stockholm office. We also have presence in Germany to cover the [entire?] region where our customers are located. So our mission at Crosser is to provide the best edge streaming analytic solution on the market and specifically for different industrial use cases. So we have decided to target this industrial use case as our primary customers are either factory floor owners or operators that want to implement different types of edge analytics use cases on their factory floors, would be for building use cases between their machines or connecting machines with other on-premise or cloud based systems and the other type of customer we’re addressing is machine builders that wants to enhance their machines with on-machine analytics capabilities, for example, to be able to offer better services and maintenance offerings with their machines.

Dr. Göran Appelquist: 01:59 So before we get into the topic of the day, I just want a position where we sit in a typical data pipeline. So as I mentioned, we target industrial use cases, which means that in most cases, the data that we work with is coming from machines. And our goal then is to be able to collect and also analyze this data as close as possible to these machines. So we typically sit very close to these machines, even on the machine itself or on a gateway or some kind of industrial server or virtual machine that you have available close to these machines. And then, in our software, we collect the data from these machines and also analyze the data immediately when it comes off the machines. And then we can send off the result to different destinations, which could be back to the machines to implement different use cases that I will come back to. But it can also be to send the data to other systems further up in the hierarchy. It could be databases, visualization systems that we have on the premise, but also different cloud systems, either our customer’s cloud systems or maybe their customers want to use some SaaS services that are available on the internet to do something based on the data that is analyzed on these machines.

Dr. Göran Appelquist: 03:21 And what we offer is that we call that a streaming analytics [inaudible], which means that we operate on data in real time. As soon as data arrives into our software, we do something with the data and then immediately send off the results to wherever that is needed after the analysis is completed. So we don’t store any data as part of our solution. And that’s why it’s a nice combination to work together with solutions like InfluxDB so we can deliver data to databases. We can also use databases as inputs to the analysis that we do in these edge devices. So storage and visualization of those kinds of solutions is not part of the Crosser solution.

Dr. Göran Appelquist: 04:05 So based on this solution that we offer, we can then address a number of different use cases, and I’ll try to summarize some of the most common use cases where we come into play today, starting with very basic use cases, but still a very common starting point for a lot of our customers. So the very common scenario is that customers come to us, and they have machines and they know that they have a lot of data in these machines, but they don’t know how to get access to that data. It’s inside their machines. It’s accessible through a lot of different protocols. Some of these protocols are very old. Lots of different data formats are being used and so on. So then we can come in and help them just extract the data that they already have so that they can start to make some use of that data and figure out what kind of values and insights they can derive from the data. So the first use case is often just pulling out the data, maybe do some basic transformation of data formats to make it more useful and then send it off to some storage system, either locally or in the cloud. It says cloud in this picture here, but you can see that as some kind of more centralized system. So that’s a very common starting point. And then after that, you can start to add on more advanced use cases.

Dr. Göran Appelquist: 05:27 The second use case is the more specialized cloud specific use case where we can also help our customers to save on some of the cost for cloud services. So instead of sending data through the more advanced-type endpoints that you have on the cloud services, if you just want to store your data, we can then deliver data directly from the edge to the storage systems that you have in the cloud, whether that is a data lake or some storage system or it’s a database, we can then send data directly to those storage system and thereby bypassing some of these other services that you don’t need for these use cases and thereby save some cost.

Dr. Göran Appelquist: 06:07 Number three is about transforming data which I would say is a common denominator for all the use cases, I don’t think we’ve seen any single use case where you don’t need to do some kind of data transformation. And that comes from two reasons, basically. One is that data very rarely arrives in a format that you want or need, even if it’s just a single source. It’s usually not in a format that you can just take and send on to whatever system that you want to use the data in, if it’s a database or some other system. And the other reason is that you, in many cases, need to collect data from multiple sources and then the data will most likely arrive in different formats from these different sources. So you need to do some kind of harmonization to get the data into a format that is independent of the original source to make it easier to work with. So data transformation is usually used in every use case to some extent. And some use cases like the first ones, that might be the only thing that you do in the edge and in some other use cases, this is the starting point. And then you add on more advanced processing after the first transformation stage. And in some cases you need to do transformation also on the output to adapt the format to different destinations. If you have multiple destinations out as a result of your processing, you might need to restructure the data for each of those destinations.

Dr. Göran Appelquist: 07:30 Number four is also a common use case, which is more about connecting machine data with other on-premises systems. In most cases, could be that you want to either just transfer data from your machines into a local system like an ERP system or SCADA system or some other of the typical automation systems that you see on the factory floor. But in many cases, you also see in these use cases that you start to add some logic in addition to just collecting the data. So you might want to define some conditions based on your data, like checking that something has happened. Maybe you will see that the machine is running out of oil and you want to trigger a work order in your ERP system so that someone will come down and fill up the oil in your machine or whatever might have happened. So it’s more about starting to add actions on top of your data and then integrate those actions with other systems that you have on the factory floor.

Dr. Göran Appelquist: 08:31 Number four, remote condition monitoring. Here, we’re talking mostly about mobile assets, trucks, cranes, harvesters, those type of machines that are out in the field and where edge analytics is used to cover two different problems or solve two different problems. One is that you often have limited connectivity with these machines. It could be both limitations in terms of bandwidth and also intermittent connectivity. Maybe you are out in the field where you don’t have mobile connectivity, so you cannot send data all the time, and then edge analytics can be used to collect and store data that you have connectivity, and then you can upload all the data when you’re back in an environment where you do have connectivity. But it can also be used as a way to analyze more data when you have, let’s say that you have limited bandwidth, so you cannot send all the sensor data that you have available on the machine using that low bandwidth that you have, but by providing edge analytics on the machine itself, you can still analyze all the data that you have on the machine and either just send the result of that analysis over this limited bandwidth or use the results locally to help or inform the driver or user of this machine locally. So that’s also a scenario where edge computing makes a lot of sense.

Dr. Göran Appelquist: 09:55 And then in the bottom row, we’re getting into more advanced use cases where you start to add more advanced logic on top of your data collection. So this could be either custom algorithms that you have developed. In many cases, it’s machine learning models that you want to apply at the edge. And these are also supported with our solutions. So you can extend the basic data analysis operations with more advanced logic if needed. And number eight is about using visual sensors. So not just collecting data from the machines, but using cameras is becoming quite common these days, driven by cameras being very cheap and also the fact that it’s very easy to set up these kind of solutions. You don’t need to connect into any of the existing infrastructure. You can set up a camera system basically on any machine. Doesn’t matter how old or what type of machine that is, a camera can always be used for different use cases. It could be to monitor the produced product, to detect faulty products, or other types of anomalies. It could be used to detect people getting too close to a machine that is dangerous and stuff like that. So there are a lot of use cases where vision is very useful and the camera produces a lot of data. So again, analyzing that video feed close to the camera makes a lot of sense instead of sending the raw video feed afterwards in the network hierarchy.

Dr. Göran Appelquist: 11:29 Yeah, I think I skipped the last two. They are more special and we leave them out for today. So this is where Crosser is typically used. So a lot of industrial use cases. And today, we’re going to talk about another solution that you often see on the factory floor or in the same environment, and that is a Historian. So I just wanted to start with defining what we mean with a Historians. So a historian is something that has been around for many years. And I think, in short, we can summarize that in this context as being a times series database. That’s basically what it is. But it’s a time series database which is designed for this specific use case of this environment. So it typically has inputs that is tailored for this environment where the type of protocols and so on needed for getting data from these machines and then you store time series data from the machines. It’s also often combined with visualization tools so that you can take a look at your data. And there is often processing capabilities to allow you to extract data from it, mainly for visualization, but to do different types of analysis of the data, comparing historical data with live data, doing different types of aggregation and so on.

Dr. Göran Appelquist: 13:02 You typically can store data for a very long time. Could be up to years in these databases. But the key thing here, and which is the reason why we have this webinar today, is that often these Historians that you see today, they are part of the traditional automation stack and are tightly integrated with that stack. And that is one of the reasons why it might be interesting to look at other ways that you can design the same functionality and also for historical reasons, these applications are typically local applications running on a single machine, which in these environments, are most often a Windows server or some type of Windows machine. So it’s a local Windows application that you run and access locally on that machine. So that is what we mean with a Historian.

Dr. Göran Appelquist: 13:55 So then the question is, why do we need a new type of Historian? And I think there are two main reasons why you want to take a look at this and why this might make sense. And the first one is that there are today new requirements on these type of equipment. First of all, there’s more types of data sources that you might want to use and store in this database, and not just the machine data. Maybe you want to connect other types of sensors that have been added that is not supported by the existing solutions. And also, we’re now in an environment which is much more heterogeneous. There are lots of other systems that might want to get access to this data. It’s not just something that is used inside the automation stack where you transfer data between these traditional systems that you have there, like SCADA and ERP and so on. So there might be a lot of different users that want to get access to this data. Maybe you want to use this data to train your machine learning model, so you want to connect it into other systems that didn’t even exist when these current solutions were designed. So new requirements is one reason why you might want to look at other ways of building this. And also, of course, new technology, a lot has happened since these systems were designed. And by using the latest technology where you combine the best-of-breed solutions for each of the individual parts, you can get something that offers this new functionality in a more efficient way. And it also gives you more flexibility when it comes to how you want to deploy this, and I will come back to that in a few slides.

Dr. Göran Appelquist: 15:45 So what we have been looking at and what we will talk about today is how you can build a Historian or a system that has the same kind of capabilities as a traditional Historian, but by combining modern tools like you see on the screen here. So what we have here is, first of all, the cross resolution, which is used to collect data from these type of systems that we see in these environments, the machines and so on, that we have talked about. We can also use Crosser to normalize our data formats to something that is suitable. We can potentially pre-process the data if needed before we ingest it into a database and store it. But we can also use the analytics capabilities of the Crosser solution to analyze the stored data so we can use now historical data in our analysis, either assess or combine it with the live data to create different types of use cases like anomaly detection or predictive maintenance and so on. So that’s the first step.

Dr. Göran Appelquist: 16:52 And then, to get the historian capabilities, we need a database. And then of course, InfluxDB is a perfect choice for that by providing a high performance database that is optimized for storing time series data. So that’s a perfect match for these kind of use cases. Its schema-less, which means that we can store any type of data structures in this database without having to predefine any schema, which also makes a lot of sense in these environments. We don’t really know what type of data structures we will get. We can use the Crosser tool to restructure it, but still later on, we might want to add more data points or new types of data and so on, and then, we can easily do that with this type of database. And then, Influx is an open platform where you can extract data using the rest APIs so anyone can get access to the data and use it for different use cases. It doesn’t have to be something that is contained within the Historian application. So anyone that needs access to the data can easily get access and extract the relevant data out of the database.

Dr. Göran Appelquist: 17:55 And then finally, if you want to visualize the data that you have, then Grafana is a perfect add on to complement the database so that you can build your own customized dashboards and visualize the data that you have in the data- and the data that you have in the database. Sorry. So these three components together can help you, in an easy way, build a modern Historian that has all the features that you typically want to have in these kind of applications.

Dr. Göran Appelquist: 18:31 So what we get then is something where you can collect data from basically any source. Crosser provides a wide library of connectors for hooking into different data sources, not just machines. There is lots of connectors for other types of data sources that you can combine to get data into your database and also pre-process and clean up your data before you store it in the database. It’s easier to add new data over time when you realize that you need to store additional data. That can easily be added into the solution and be stored in the database. And then you can use the Grafana tools to build your own customized dashboards, and then, that’s a very flexible thing with lots of different ways of visualizing data so that you can build a dashboard that is tailored for your specific needs. And then once you have data in the database, you can also use the Crosser tools to analyze the historical data, either with some basic logic, but you can also introduce or combine it with your own custom, logic or machine learning models that you want to run and apply on your stored data.

Dr. Göran Appelquist: 19:41 And this whole solution with these three components, it’s very easy to set up, as I will show you in the following slides. So this is, it’s not an out-of-the-box solution, but very close to that, I would say, it’s typically something that you set up in a very short time frame, even if you start from scratch. So even though it’s three different systems that we connect together here, it’s very easy to get this up and running in any type of environment that you might have. So now, we will look at some deployment scenarios because that is one of the advantages you get with this type of solution that you’re not stuck with having one local application that runs off one server and you need to sit next to that server to get into your data. Now, you have much more flexibility in terms of deployment so that you can build different setups that matches your needs and requirements. So for these examples, I have assumed that we will use a containerized deployment where, first of all, the Crosser software running at the edge is the container. But I have also assumed that we combine the Influx and Grafana tools into a single container. You, of course, have the option of installing these as separate containers as well, which gives you even more flexibility. But for these examples, I’ve decided to include them in one container to make it a little bit simpler to explain. But that is, of course, an option as well. So the starting point is to build a local Historian, basically a replacement for the sort of legacy systems that you have out there.

Dr. Göran Appelquist: 21:17 So what you need to do then is to install the Crosser container and the Influx Grafana container on the same host, which would then be something that runs close to the sources of your data or the machines. So with this type of setup, you then get a Historian with the local UI. You have your Grafana dashboards that are available locally. It’s still a web interface. You can, of course, access it from anywhere as long as you have access to this host system. But typically, this type of setup is for use cases where you want to have a local UI close to the machine so that you can monitor what is happening in your machine. So here, we use a single platform or host for both the Crosser tools and the Influx and Grafana tools, and this you can install one per machine or one per factory floor or whatever makes sense for your specific setup.

Dr. Göran Appelquist: 22:19 The next step is that we scale this up a little bit, so we move or separate the Crosser tools from the Influx and Grafana tools. So now, we have a separate Crosser instance next to each machine to collect the data and pre-process the data, and then we send it afterwards to a single instance of the Influx and Grafana, which could then be a more centralized installation, maybe on a factory floor or a part of a factory or whatever makes sense. So in this way, we can scale up the data collection and pre-processing capabilities by hosting the Grosser nodes close to the sources, but still have a single user interface and database where we store all the data. And these distributed Grosser nodes that we have next to the machines can of course, be used in parallel to implement other use cases. Maybe we want to build machine to machine use cases where low latency is critical. They can still run in parallel with the data collection into the database on these distributed nodes that you have next to the machines. And if we want, we can also complement this set up with a Grosser node next to the central or more central database instance if we want to do analysis of the historical data, for example, then that is also an option that you can have with this second example. So this is one way of scaling, and there is, of course, other ways you can scale depending on how you decide to deploy these containers. That’s one of the beauties of having a containerized deployment set up. It’s easy to scale and distribute these containers wherever they make sense.

Dr. Göran Appelquist: 24:06 And as the final example, we’ve taken it just one step further. It’s very similar to the previous one. But in this case, we moved the Influx and Grafana tools into the cloud so that we can monitor or collect data from multiple sites instead. So now, we have a Crosser instance or multiple Crosser instances in each of these sites, which could be different factories or something else to collect and pre-process the data. And then they feed this centralized database that now sits in the cloud. So it’s very similar to the previous option. But we’re now collecting data from multiple sites and deliver over the Internet up to the cloud. But otherwise, the deployment setup is very similar. So this is just another way of using the same kind of architecture. So with these kind of tools, you get a lot of options for how you want to deploy your Historian instance. You’re no longer tied to having a dedicated local system. You can decide to distribute it in different ways and scale it depending on your specific needs.

Dr. Göran Appelquist: 25:14 So now, I’ve been talking a lot about the Crosser node. So let me present a little bit more about what Crosser is doing and what the Crosser solution is doing. So basically the reason why Crosser exists and why we started this company was that we saw a need to simplify how you build and deploy these type of edge analytics solutions. Most of the solutions we saw on the market, they were designed for software developers and software developers are typically a scarce resource among this customer base. So we wanted to provide a solution which enabled also other types of resources to create, design, these types of use cases and also roll them out into production. And the way that we do that is by providing a library of pre-built modules that implements a lot of the common functionalities that you see across these use cases. And then we have a graphical design to where you combine these modules into processing flows. So instead of writing code, you combine these pre-built modules.

Dr. Göran Appelquist: 26:19 Then, of course, we know that we will never cover every possible use case with these proposed modules. So it’s also been important to offer flexibility when it comes to adding custom functionality on top of what Crosser offers so you can complement whatever we offer in different ways. You can combine standard modules with modules where you can plug in your own code. You can plug in your own machine learning modules around together with these standard modules, and you can even develop your own modules and extend the library with whatever custom functionality that you needed.

Dr. Göran Appelquist: 26:56 And then, finally, it’s also about lifecycle management, because that’s another common thing that we see among customers. It’s often quite easy to get up and running with the use case in a lab or POC environment, but then to take it into production and especially at scale is much harder. And a lot of the projects get stuck when you reach that point. So we also offer tools to help you bring your solutions or use cases into production and also help you manage those over their lifecycle because we know that wherever you start, that will never be the end game. You need to do updates. You want to add new data sources, you want to improve your logic and so on. So being able to do that in an easy way, even when you have lots of these distributed environments in operation and do that in a secure way, if you implement these kind of solutions on the factory floor, they can be quite critical for your production. So then of course, you’ll want to make sure that it’s only the authorized personnel that can make changes in your production environment. So that is also part of the Crosser solution. So it’s designed as an enterprise system with the typical requirements that you have on those kind of systems.

Dr. Göran Appelquist: 28:15 So we came up with a solution that looks like this. So we have two components in our solution where the first one is the Crosser node that you see on the left hand side in this slide. And that is what I’ve been talking about in relation to Influx and Grafana. So that was what you saw on the previous slides. So this is a software that is installed at the edge wherever you want to collect and analyze data. So this typically runs on the machines or on the gateway sitting next to the machines or some industrial server that you have on the factory floor. It’s installed as a docker container. So we try to be agnostic to hardware so we can run it and wherever you can run Docker, which is almost everything these days, it’s very small footprint. It starts at 100 megabytes in size, so you can run it on some very lightweight devices. But this software is a generic software, meaning that it’s the same software that you install everywhere, independent of the eventual use cases that you want to deploy into these nodes. So this software by itself is pretty stupid. It doesn’t really do anything. It’s just an engine which you can then configure for different types of use cases.

Dr. Göran Appelquist: 29:29 And to add some intelligence into these nodes, we use the second part of our solution, which is the Crosser cloud service that you see on the right hand side. So this is a Web service that we host for our customers and that provides the user interface into the system. So there you find the design tools to design your use cases and also the deployment tools to help you deploy these use cases into these edge nodes. And the way it works is that you first design these use cases with our designer, but then you have the issue of verification because you want to verify that these applications that you are built are doing what you want. And since these use cases in most scenarios operate with data sources that are only available from these local environments, if you’re connecting to a machine that is typically not accessible from the internet. So if we sit in the cloud and design these use cases, we still want to make sure that we can verify them with the actual data sources. So then, we allow the design tools to be connected to any of the notes that you have in the field so that you can interactively test run your designs on one of these remote nodes which has access to those data sources that you have used so that you can verify what you have built with the live data from your data sources before you decide to deploy whatever you have built into these nodes permanently. So that’s the next step.

Dr. Göran Appelquist: 30:57 And then when you have something that you think is going to work, you can then deploy that into these nodes for local operation. And from that point onwards, all the processing runs autonomously in these edge nodes. There is no longer any dependence on the Crosser cloud service. So the Crosser cloud is just a configuration and management tool that you use whenever you want to make configuration changes to these nodes. At the point you have deployed something permanently into these nodes, then there is no need to use Crosser cloud and you can even disconnect the notes from the internet if needed.

Dr. Göran Appelquist: 31:35 And you can run multiple applications, as I mentioned. So if we build this Historian use case, that could be one application that you run on an edge node, but you can run any number of applications in parallel on these nodes, and they will run independently of each other. So you cannot pick one without impacting the others. But whatever you do, you never change that container that you have at the end. So that stays the same. And once you’ve installed it, you do all the configuration and deployment of different use cases from across the cloud without having to update the local container software.

Dr. Göran Appelquist: 32:10 And when you build logic, as I mentioned at the start, we have this library of modules that you use to create these different use cases. And there are basically five categories of modules that we provide. They are input modules to connect into different data sources, machines, databases, files, APIs and so on to get data in. And one of the key things with our system is that we haven’t made any assumptions on data formats. We can accept data in any format coming into our system, and then we have a lot of modules in our library to help you restructure data so that you can get data in the form of that you want. So that’s the next category of modules. Transformation modules are very common in these cases. And then you can start to add logic on top of the data that you have collected, starting with basic condition rules all the way up to running custom code and machine learning models and apply them all your data. And then, of course, you want to use the result of your analysis somewhere. So then you have output modules to send the results to different destinations, which basically mimics what you have on the input side so you can send data back to the machines, maybe to build process optimization workflows where you calculate, you set things for a machine and send that back to the machine, or it could be an anomaly detection use cases where you want to stop the machine to prevent damage or remove false reports from the production line so you don’t spend resources unnecessarily on the false reports, but also connecting to databases, send data to the cloud and so on. That is all available in this library.

Dr. Göran Appelquist: 33:54 And then you use these modules to build flows using our design tools. This is an example of that where we have- so this is a typical data flow where data flows from left to right. From the left hand side, we have some examples of input modules here exemplified with some machine connectors connecting to PLCs in a machine, but also an MQT to input to get data from MQT sensors. So each of these nodes comes with its own MQT broker, which you can use so that other sensors can just publish data into this broker, or if you already have a broker, you can, of course, connect it to that as well. Same thing with HTTP. So there is also an HTTP server in each of these nodes so that you can be able to http endpoints for someone to publish data that you need to use in a flow like this.

Dr. Göran Appelquist: 34:45 And then we have a set of modules to reformat data because these three different sources, they will produce data in different formats. We start by harmonizing data so that in the middle, we don’t have the same format, our data independent of the original source. And then we can start to apply some processing on the data. And now we just do some basic stuff here to exemplify that, to show that you can have different types of processing for different destinations at the top. We do some basic aggregation of the data. Now this is streaming data, so we got time series data from a number of sensors coming in here, so this aggregation module that you see here, it will aggregate data over a time window for each of the sensors that we receive and then calculate some statistics like average, and min and max and so on. And then we send that up to the Amazon [Piety?], end point, as an example.

Dr. Göran Appelquist: 35:33 In the middle, we have an example of how you can combine these standard modules with your own custom algorithms. So one case to do that is to use this python module that you see here. So this is basically an empty shell where you can plug in your own python code and combine with these standard modules. So here you get access to a standard python environment. You can also define any third-party libraries that you need to be installed to run your code. And then when you deploy a floor like this, those libraries will be installed automatically by that distributed node before it starts executing the flow. And then, we send the result of that analysis, which could have been a machine learning model, for example, we send that back over to MQT maybe to some local system. And then, at the bottom, we create some message and send to an ERP system, in this case, which could be there an on-premises system or a cloud-based system. So this is how you design use cases with our design tool.

Dr. Göran Appelquist: 36:38 And then once you have built something that you’re happy with, we then get into the deployment tools that I mentioned. So the basic idea with our tool is that these flows that you build, they should be self-contained, meaning that once you have built the complete flow, you can deploy it into any node that you have in the field without needing to know anything about that local node. So all additional information that is needed for that node to execute your flow is part of the flow configuration. In addition to what you see, [inaudible] with these connected modules, it could also be other things like additional files that you need to be available locally. Let’s say you want to run the machine learning model, then that machine learning file needs to be available in local storage before you start executing this flow. So then you reference that file when you design the flow and when you deploy it into a node, it sees that it first needs to download this machine learning file before you can start executing this flow. Same thing if you have configured this python module to use some third-party libraries, that will also be seen by the local node so that it will install that library in your local environment before executing this flow.

Dr. Göran Appelquist: 37:49 And actually also the modules that you use in these flows are downloaded dynamically. So the module libraries hosted in the cloud and when you deploy a flow like this, the module will pull down all the modules that are needed for this flow into its local environment. So all the functionality or all the intelligence, if you like, comes through these flows into these edge nodes. So whenever we add new modules into the libraries, offer new functionality or new connectors to new systems, this new functionality can immediately be used with any of the nodes that you have already installed in the field. So, again, the idea is that we should not have to update the containers that you have around at the edge, because that is usually much more work than changing things from the centralized cloud environment. These flows are also very version controlled, so you can keep track of changes that you make over time and see when you have made updates on different nodes that you have in the field.

Dr. Göran Appelquist: 38:49 So with these tools, we have different use cases. But today we were focusing on Historian, so I just slightly change that example that I had in the previous slide to show you how you could build something that would be like a Historian using the tools that we have seen today. So we have a similar set of inputs here, four different data sources, local machine data sources. They will produce data in different formats. We can start by harmonizing these into some common format. We do some basic aggregation to reduce the amount of data and then we send it off to the Influx database. So there is a standard connector for Influx. So it’s easier to store data in your Influx instance, which could be running on the same machine next to this container, or it could be somewhere else in your infrastructure. It doesn’t really matter. It’s the same connectors. It’s easier to store data in Influx. And at the bottom, you also see an example of how you can build the flow to analyze the historical data that you have in your database. So then, you just use the Influx reader module instead to get data off your Influx storage. And then, you can apply some kind of off-logic, in this case, exemplified by running a machine learning model inside the python module to analyze your historical data, and then now the assumption here is that we do some kind of anomaly detection and when we detect some anomaly, we trigger, support it within a cloud based support system, just as a simple example to give you an idea of what it would look like. So these two flows in combination were actually just the first one. If you just want to be the Historian to get your data in to the database and then using Grafana, you can then start building your dashboards to visualize whatever data that you have in the database. So it’s very straightforward to set up something like this and get it up and running in a distributed location where you need to have this kind of capabilities.

Dr. Göran Appelquist: 40:58 And of course, you can have multiple instances of this. You’re not limited to one. You can now decide to install this wherever it makes sense. And you can manage all of these instances from a central location and replicate the same. So this logic that you see at the top can easily be replicated into any number of instances. So there is a grouping concept in and across the cloud so that you can operate on groups of nodes, so you can create a group of all your Influx enabled nodes and then deploy the same logic into all of those with the single operation. So it’s very straightforward.

Dr. Göran Appelquist: 41:36 So with that, I’ve reached the end of my presentation. So what I’ve tried to convey today is that by using today’s technologies, we can create similar functionality as has historically been used for a long time, but to give a new and more flexible and open architecture that meets today’s requirements, and also, in many cases, being a very cost effective solution. So Crosser is here to help you with this and other Factory 4.0 use cases that you might have. And if you want to learn more about Crosser, you can reach out to us either to schedule a demo. You can also sign up for a free trial if you want to get hands-on experience with our tools. And we also offer online training so that you can learn more about how you can use the Crosser tools. So with that, I’ve reached the Q&A section.

Caitlin Croft: 42:38 Fantastic. That was great, Göran. So just another friendly reminder, everyone. You have InfluxDays coming up. So please let me know if you’re interested in a free ticket to the Influx training and the contest itself is completely free. The first question is why do you cleanse your data?

Dr. Göran Appelquist: 43:01 Sorry, could you say that again? I was reading questions.

Caitlin Croft: 43:04 Why do you cleanse your data?

Dr. Göran Appelquist: 43:10 Why? I would say that it depends. It’s not something that you have to do, but in many cases you get a lot of additional information that you’re not interested in, and it makes sense to remove that because you don’t want to either store or transport unnecessary data. Of course, if you just get clean data, then you don’t do it. But in most cases that we come across, you do get a lot of additional information that you might not be interested in. Could also be the other way around that you want to extend your data by adding metadata that you may be having some other system and tag about along with the data, so that it makes it easier to use the data further down in the processing pipeline. So maybe you want to add data as well at the edge to make it easier to use the data later on. So it’s both ways, but in many cases there is additional information that it’s not really relevant for the use cases that you’re interested in. And then it makes sense to remove it as thoroughly as possible so you don’t waste storage and bandwidth by transporting this data around.

Caitlin Croft: 44:13 Great. Would be interested to hear about all of the aggregations that you have implemented compares to examples of other typical OPCUA, DA, and HVA aggregation?

Dr. Göran Appelquist: 44:30 Probably not at all as many as you would see in these historical systems. We covered the basic ones, but there’s a lot of- I mean, with an OPE server, then we have not tried to replicate everything that we’ve had there since you can use it in the OPC server. So we have the basic aggregations that you can do, but there’s a lot of stuff that we do not cover. And to be honest, there has been very little demand for that from our customers. So we are primarily driven by demands from what is in our customer base. And so far, that has not been an issue. So I think we seem to cover what most customers need. But if you compare us with a traditional system, you would probably find a lot more options there than we offer. That’s where we are today. If there is a demand, I’m pretty sure that you will see more offerings from us. But right now, I would say that it’s pretty basic compared to what you see in many of the historical systems.

Caitlin Croft: 45:35 Okay. The same goes for how the API call for getting data is able to automatically provide aggregated points for trending and not relying on the actual collected data points. For example, if I’m trending a day’s work, I don’t need all 100,000 second data points as I will probably only be able to show around 500 dots on a trend.

Dr. Göran Appelquist: 46:05 Yeah, let’s see if I understand that question.

Caitlin Croft: 46:11 I can also unmute, Yowan. Let we see here. Yowan, I’ve unmuted you so you can unmute yourself and toss if you want to expand on this question, if you’d like.

Dr. Göran Appelquist: 46:23 You guys hear me?

Caitlin Croft: 46:24 Yes, I can hear you.

Attendee #1: 46:27 So I’m the one to provide the other question too, which is I come from the food process industry, especially on the industrial automation side and have worked with data historians quite a bit. And I have also, some years ago, we started this whole thing about cloud historians and utilizing these open-source technologies to try to replicate what it is that a, for example, prior historian or prophecy historian’s doing. And one of the things that we found is that these systems, they’re not made to replace the processes, right? They’re made for the time series data in general. And to what I’ve seen, for example, when we start to do trending, some of the smart stuff put into the historian is that it won’t because it’s based on on-premises technologies and old technology still, right, so they won’t send 100,000 data points to the trend tool for it to then figure out what to actually show. It will only show- it will only send to 500 points based on the API request and information about how much that the different can actually show. So all the smarts and efficiency is put into the API on the server side and not in the client side. And so showing a trend of a whole day or a week or a month can take quite a bit of time if your system is not optimized to internally process that data and send the right data to the trend tool. So that’s why I’m looking at- those are some of the key elements of a process is storing in the process industry, that it has the aggregation that you expect, but also has smarts around, I want to be able to- within two seconds, I want to see the trend, right?

Dr. Göran Appelquist: 48:17 I mean, we don’t have those tools for sure today, and I think to put that in some context, I don’t think that we want to position this what we have told about today as a full replacement for Historians. I mean, there is a lot of features in traditional Historians that we don’t cover. What we see and the reason why we build this up is because we have several customer cases where they want to build something that is, you can probably say, a simpler Historian, but that covers their use case, so they don’t want to bring in a full-fledged Historian because they don’t need all those capabilities. So I think if you compare it head-to-head yes, there are lots of features that you will not get with this type of solution, and there are features that you will get that you might not have in the historical solutions. But we don’t want to position this as a full-fledged replacement for everything that is out there. I mean, this is a system that has been around for many years and has developed features that are needed that are not necessarily available with this type of solution. But I think there are use cases where this makes a lot of sense, and it’s very easy to get a set up and a good complement if you have something like Crosser running close to your machines. This is a good complement to get additional features on top of that. So that is sort of the message that we want to convey.

Caitlin Croft: 49:48 Awesome. Yowan, do you have any other follow-up questions?

Attendee #1: 49:51 No, I’m good, thanks. I’m interested to see what the level of when you call the Historian, right, and not a time-series database, then there’s a thought around you looking at what traditional historians are doing today and where they’re going. And as I said, I’ve been involved in my prior life a little bit on replacing classic Historians with [inaudible] historians and gotten into the same issues that there’s a lot of work there to close those gaps that a [inaudible] Historian today is very good at.

Dr. Göran Appelquist: 50:28 Yeah, no, I fully agree.

Caitlin Croft: 50:30 Right. Next question. How do you handle unit conversion? So, for example, Celsius to Fahrenheit or using a different calibration model?

Dr. Göran Appelquist: 50:45 Yes, so there are modules in our library that you can use to scale later, and of course, it needs to be- if it’s a fixed conversion between in Celsius and Fahrenheit, then then you don’t need any additional data. But if it’s other types of scaling, then you need to provide the scaling parameters with your data tags, but you can do that when you define the tags that you read out of the machines in our system and then feed that stream of data through a module in our library that you can do generic calculations with and just combine those values to create whatever transformations that you need. So that can be done in a stre

aming manner, but it requires that you have the relevant scaling factors from somewhere either that you know what type of conversion you need to do or that you provide as additional metadata together with your data points. But when you read data from- I mean, that’s one of the issues when you read data from legacy systems like PLCs. Then, you typically just get the sensor value or whatever value it is. There is no metadata associated with that. But then you have the option of adding that kind of metadata when you define the tag for the registers that you read out of the PLCs in our system so that you can tag along additional parameters that you can use to do, for example, scaling or unit conversions on your data.

Caitlin Croft: 52:14 Are you using Telegraf at all?

Dr. Göran Appelquist: 52:19 No, we are not.

Caitlin Croft: 52:22 How long does it take to set this up for a typical factory? So, for example, if you look at a factory with 50 production machines?

Dr. Göran Appelquist: 52:33 I mean, setting up their software that you seen today is very straightforward. I would say what takes time is to get the infrastructure in place. You need the docker environments where you can run these docker containers. And then, of course, it depends on the number of instances. But once you have that, installing these containers and get the system up and running, that’s, yeah, I don’t know if I can give you a number, but I mean, it’s a very smooth process to get this up around. The Crosser container, you just install using the standard operations and itself registers with the cloud service and then you’re up and running. Then, of course, you need to build these flows to collect your data. And then, it depends on how complex your machine setup is, what type of connections you have, and how many data points you’re reading out and so on. But yeah, difficult to give you a number, but I would say that yeah- I guess you want the number, but I don’t really know what to say there. We’ve seen this being up and running in a day definitely. But again, what takes most time is to get the initial infrastructure in place, the docker environments and network connectivity so that you can reach the data sources. Once you have that done, it’s typically straightforward to get the software components that I’ve been talking about today up and running.

Caitlin Croft: 54:01 Okay. Is it possible to do model assets with Crosser?

Dr. Göran Appelquist: 54:09 Model assets? I don’t think I understand that question.

Caitlin Croft: 54:20 Okay, Antonio, I just allowed you to unmute if you would like to expand on your question? I don’t think we can hear you. Well, while we wait for Antonio to get the mic working, next question for you, Göran, how do you differentiate against NodeRed?

Dr. Göran Appelquist: 55:07 Now, that’s one of our favorite questions, and I would say that the main difference, as we’ve seen the design tool, is, of course, very similar. So when it comes to how you design stuff using this graphical editor, it’s very similar to NodeRed. But the main difference is here that we have decoupled the designer from the node itself. So instead of running the design tool on each of the individual notes, you now develop these flows centrally without any connection to where they eventually will run. And then, once you design something, you can then deploy it anywhere. You can deploy it into one node or you can deploy it into 100 nodes replicating the same flow. And also when you do that, you can also parameterize these flows so that you can get local adaptation based on where you deploy it. So let’s say that you have PLC connectors with different IP addresses in different environments. You can then have that as a local setting that would be adapted when you deploy it into each of the nodes so that you can reuse easily the same flow in many locations. So this decoupling is one thing. And the whole deployment tool suite that we have with Version controls of these flows and how you manage nodes is something that differs from NodeRed.

Dr. Göran Appelquist: 56:28 And then there’s also some technical differences. I mean our runtime is built on .NET core. So we have some performance benefits. That said, it might be important or it might not be important depending on your use cases. But that’s also a difference between the technologies that we have been using. But I would say the main difference is this decoupling of flow design and deployment. That’s where we differentiate against NodeRed. And NodeRed, I mean, we really like NodeRed. It’s a great tool. And it’s also actually one of our best lead generators because it uses that. I’ve started with NodeRed, often gets stuck when they come to this place where they want to deploy it into production on that scale. And that’s where we can help out and take the same concept, but bring it into a production environment and on that scale. So, yeah.

Caitlin Croft: 57:28 So we are slowly running out of time. We’ll take a couple of more questions and then for any questions that we don’t get too, I’m happy to connect you with Göran who can answer them. I assume that the timestamps are all in UTC and are converted to local time on the client’s computer. That correct?

Dr. Göran Appelquist: 57:48 Correct. So we typically work with UTC timestamps. Yes.

Caitlin Croft: 57:56 Oh, okay, so follow up to Antonio’s question, is it possible to model assets with Crosser, he’s trying to understand to model an equipment like a wind turbine with attributes and analysis.

Dr. Göran Appelquist: 58:15 Yeah, I’m not sure I completely follow, but let me try to say something there. I mean, you can - so we are about collecting and analyzing data, and you can create whatever type of data model that you want using our tools, but you have to build the flows to create that data model. It’s not like we have a pre-defined data model that we can generate, but that’s actually a use case that we see with customers where they have they find some internal data model that they want to harmonize all their machine data according to. And then they use our tool and the models we provide in our library to restructure whatever data they get off the machines so that it fits into that predefined data model. But it’s something that we need to do by building a flow and use the modules that are available for restructuring data. But with those modules, you can create basically any type of data structures. I don’t know if that is what you mean with a model of your engine, but we are about doing this restructuring on the fly on the data as it passes through. So then, we can send it somewhere where you can use it to build a digital twin or something like that. But that is something that is outside of the Crosser solution.

Caitlin Croft: 59:43 Great. Final question for today. Can you go over just a little bit of how to write data to InfluxDB? Is it over the internet as well as the local LAN?

Dr. Göran Appelquist: 59:56 Yes, so it’s always a network connection, even if it’s local. So you point out your Influx instance using a URL so it can sit anywhere. Could be local or it could be on the internet or somewhere. So, yes, you can do both.

Caitlin Croft: 01:00:15 All right. Thank you, everyone, for joining today’s webinar. I know there were tons of questions. So if you have any other questions that you think of were questions that we weren’t able to get to, please feel free to email me. I’m happy to connect you with Göran, and he can help you out. Once again, I’m just going to remind everyone we have InfluxDays coming up, and I hope to see you all there. And the session has been recorded and will be made available for replay this evening. So you’ll be able to watch it. So on May 10th and 11th, we have the hands-on Flux training. And if you’re interested, I’m happy to get you a free ticket. And then on May 18th and 19th is the actual conference, and the conference itself is completely free. So we hope to see you all there. And thank you for joining today’s webinar.

[/et_pb_toggle]

Dr. Göran Appelquist

CTO, Crosser

Goran started his career in the academic world where he got a PhD in physics by researching large scale data acquisition systems for physics experiments, such as the LHC at CERN. After leaving academia he has been working in several tech startups in different management positions over the last 20 years.

His passion is learning new technologies, using it to develop innovative products and explain the solutions to end users, technical or non-technical.

How Crosser Built a Modern Industrial Data Historian with InfluxDB and Grafana

Watch the Webinar

Dr. Göran Appelquist

CTO, Crosser

Session Registration

Product & Solutions

Developers

Company

How Crosser Built a Modern Industrial Data Historian with InfluxDB and Grafana

Watch the Webinar

Dr. Göran Appelquist

CTO, Crosser

Session Registration

Product & Solutions

Developers

Company

Sign up for the InfluxData newsletter

Follow Us