Modernizing Your Data Historian with InfluxDB
Session date: Nov 05, 2024 08:00am (Pacific Time)
A data historian is a type of software designed for capturing and storing time series data from industrial operations. They are often a key part of the Industrial IoT ecosystem, where numerous devices and systems generate continuous data streams. Data historians are ideal for industrial automation and process control, whereas a time series database (TSDB) handles any data with a time stamp. InfluxDB is the purpose-built time series database used to collect, analyze, and store metric, event, and tracing data. With InfluxDB 3.0, developers can ingest billions of data points per second with unlimited cardinality.
In this webinar, learn how to modernize your current data historian with InfluxDB. With InfluxDB, customers gain the flexibility that comes with a cloud-native solution—including the wider ecosystem with 300+ integrations.
During this live session, Anais Dotis-Georgiou will dive into:
- TSDB v.s. historian considerations
- InfluxDB 3.0 overview: Product overview and key features
- Live Demo: Architecture overview and tips/tricks
Watch the Webinar
Watch the webinar “Modernizing Your Data Historian with InfluxDB” by filling out the form and clicking on the Watch Webinar button on the right. This will open the recording.
[et_pb_toggle _builder_version=”3.17.6” title=”Transcript” title_font_size=”26” border_width_all=”0px” border_width_bottom=”1px” module_class=”transcript-toggle” closed_toggle_background_color=”rgba(255,255,255,0)”]
Here is an unedited transcript of the webinar “Modernizing Your Data Historian with InfluxDB.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors. Speakers:
- Anais Dotis-Georgiou: Developer Advocate, InfluxData
ANAIS DOTIS-GEORGIOU: 00:04
Hello, and welcome, everybody. Today, we’re going to be talking about data historians and InfluxDB. I’m going to give everyone a couple minutes to roll in and get settled and just kind of have a chance to get comfy. And I also want to go over some initial just housekeeping rules before we get started, which is that a copy of this presentation will be made available to you at the end of this webinar. It should be sent out to your email, so keep an eye out for that. And I also want to encourage you to ask any questions that you might have about this webinar, or anything related to InfluxDB, either on our community Slack or our community forums at community.influxdata.com. And like I mentioned, I will give people just a couple minutes to roll in and for myself to pull up my slides. And my name is Anais Dotis-Georgiou, and I’m a developer advocate here at Influx. And I’ve been at Influx for almost six years now. So, a good amount of time. And I have to say, Influx has changed dramatically over that time. So, it’s almost felt like I’ve been sometimes at like three different companies throughout that time just because we’ve had so many different product releases.
ANAIS DOTIS-GEORGIOU: 01:23
And I’m also based out of Austin, Texas, for anyone who’s interested. I always enjoy hearing where everyone else is from too. So, if you feel like sharing where you’re from in the chat, I’d love to learn. And I’m still just finding my slides and getting them set up. So, give me one second. Germany, India. Thank you so much for sharing. Belgium, very cool. North Dakota. Hello. Welcome. Welcome, everyone. Switzerland. I want to go back to Switzerland. Oh, and yes, of course, one last thing. Feel free to ask any questions that you might have either in the Q&A or the chat. I will try and get to all your questions at the end of this webinar. So, thank you so much, everyone, for joining us today. We’re going to be talking about modernizing your data historian with InfluxDB. This was originally presented by Mike Devir, a Solutions Engineer, but I’m going to be presenting it today. My name is Anais Dotis-Georgiou, and I am a developer advocate at Influx Data. Influx Data is the creator of InfluxDB, and I’ll talk more about InfluxDB and what it is in just a second. And I am a developer advocate. And for those of you who don’t know what developer advocacy is, in a nutshell and in a very vague nutshell, it’s someone who represents the company to the community and the community to the company.
ANAIS DOTIS-GEORGIOU: 02:55
So, what that really means in practice is that I do things like this. I give webinars to help educate the community about InfluxDB. I create technical presentations and tutorials or demos. You can find all those demos at influxcommunity.org on GitHub. And I also write blogs and produce YouTube videos and answer community questions online. So, if you have any questions about InfluxDB, please ask them. And more than likely, either me or another developer advocate will be answering them. So why am I here today? I’m here to kind of help provide a high-level understanding of the differences between data historians and time series databases like InfluxDB. I’m also going to kind of give you an overview of some of the benefits of a time series database and InfluxDB so that you can have a better understanding of why you might want to use it. So, for the agenda today, we’re going to start off by talking about data historians versus InfluxDB and some of the differences there. Then we’re going to talk about InfluxDB itself, what it’s built on, what makes it different from previous versions of InfluxDB, and what it’s going towards in the future. And then we’re going to talk about some integrations and partners. And last but not least, some customer examples. And then I’ll open the floor for Q&A so you can ask any questions that you might have about anything related to time series or InfluxDB. Please you can keep it broad.
ANAIS DOTIS-GEORGIOU: 04:24
So, let’s talk about data historians and time series databases. So, a data historian is a specialized database for industrial settings. It’s typically developed for on-prem to meet the security demands of operation teams in industrial settings. And they’re designed for collecting, storing, and retrieving any high-frequency timestamp data. And a time series database is more general purpose than a data historian. So, it’s for storing any data with a timestamp. It can even include sometimes logs and traces. But both data historians and time series databases serve the purpose of efficiently storing and retrieving timestamp data. So, what are some of the pros of a data historian? Well, it’s highly tailored for operational teams in manufacturing settings. So, historians were built by operation teams from the ground up for any operation team network. And they work really nicely with industrial systems and industrial software needs and whatever standards that you might have. Also, they are an end-to-end solution. So, they offer additional important functionalities for industrial IoT use cases like specifically dashboards, any maybe PID controllers or controller integrations as well, things like that.
ANAIS DOTIS-GEORGIOU: 05:52
Some of the cons is rigid legacy tech. So, because they’re usually proprietary tech and they’re built you know like a library built only for librarians, for example. It can be extremely challenging for an organization to enable other roles like IT engineers, data scientists, maybe some business analysts to perform functions with the data historian that it wasn’t originally designed for. So, in a nutshell, then interoperability isn’t very great. And so, it hampers team’s ability to adapt, to innovate, and to grow as needed and to incorporate other tech stacks with the data historian. As a result, they’ve remained largely unchanged or untouched for decades because they’re built on these proprietary standards. So again, it’s just hard to develop against them and integrate them into modern tech steps. They’re also on-prem enclosed systems. And this generally creates, again, these kind of silos and connectivity issues as well. And another huge issue is just vendor lock-in that creates unbalanced power dynamic with suppliers. And again, closes you off from using other technologies that might be helpful to solving your problems on the manufacturing floor.
ANAIS DOTIS-GEORGIOU: 07:20
Let’s talk about time series databases and some of their pros and cons. So, some big pro of time series database is their application versatility. So, because time series databases are meant to be general purpose and in InfluxDB’s case, we offer hybrid deployments with schema less design, meaning that you get that flexible data model and that you aren’t really locked into the proprietary way of writing data for a specific IoT use case. Then that means that it can constantly be adapted and expanded so that it can meet your growing needs. So, for example, if you start as an industrial IoT use case and then maybe you also need to write a few logs, you can maybe do that. And so, it can grow as your needs change as well. It also gives you development agility. So, because it isn’t a legacy tech and you can make use of open standards and protocols, then you can easily develop against it and not require specialized industrial skills in order to use it. So that just helps improve your rate of development and your capacity to innovate and to build other solutions on top of your time series database.
ANAIS DOTIS-GEORGIOU: 08:38
Some pros that are specific to InfluxDB is that it helps support writing time series at a massive scale and supports unlimited cardinality. So, what that means is that you don’t have to worry about the dimensionality of your database and your specific table. And so, you don’t have to worry about how you’re going to design your schema. And that just provides you more flexibility to add more metrics, more machines, more devices as you grow and scale. We also, like I said, we have hybrid deployments. We have edge, on-prem, cloud. We have features for writing from the edge to the cloud so that you can consolidate your time series data in the cloud. And that feature specifically is called edge data replication that also has store forward capabilities and caching and buffering capabilities as well. So, if your data warehouse is offline, you can still make sure that you are collecting that data and write it to your data warehouse as needed when it comes back online or to InfluxDB Cloud. We also have flexible schema on right. We do have hot and cold storage tiers, which means that when you want to query your more relevant recent data, you’re querying that from hot storage, and you are able to perform really efficient, fast queries. We also have a really vast community and network of integrations and partners.
ANAIS DOTIS-GEORGIOU: 10:06
So, we have JDCB drivers that enables you to visualize data in things like Tableau or SuperSet. We also have Grafana integrations. We have integrations with a bunch of stream processing frameworks as well like Bite Wax and Quix so that you can perform ETL operations with tools like that. Those are specifically in Quix’s case, for example, we have Source and Sync plugins. And that’s basically a Python riper for Kafka streams in a sense. So, it abstracts away all of writing Kafka and using Kafka yourself. We have interoperability. We’re looking to build interoperability with Apache Iceberg as well, which means that you’ll be able to read your Parquet files in InfluxDB directly from Iceberg or directly from a data warehouse of your choice so that you don’t have this vendor lock-in that you do have with data historians because we really want to help enable developers in InfluxDB v3 to get their data into InfluxDB any of their time series data and then quickly take it out so that they can use it alongside a variety of other data that they might have to actually perform actionable insights with that data.
ANAIS DOTIS-GEORGIOU: 11:30
And those are just some of the interoperability and tools that I mentioned, but we have a lot more as well that I’ll get into. So, some cons of time series databases are that they’re not domain-specific, right? So, if you are looking for a solution like Delta V, for example, that also has specific control equations, parameters, and various functions that are specific to chemical industry, that’s not InfluxDB. So, what you gain in flexibility in terms of development is what you lose in some of the out-of-box-specific features. So that’s where you have this sort of build versus buy comparison that you need to perform for yourself. What I’ve also seen a lot of people do is use time series databases alongside data historians when they want to supplement any feature that they don’t already have in their data historian. So, there’s some really cool use cases of people using InfluxDB first as a community member for at-home hobby projects from monitoring things like their at-home barbecue setup or their home beer brewing setup. I’ve even had customers use InfluxDB to create sort of a control feedback loop with monitoring their own pools and then automating the chlorine levels that they need to supplement in their pool and build solutions like that. And then they realize that it’s easy to use InfluxDB.
ANAIS DOTIS-GEORGIOU: 13:06
It’s easy to build on at home. And then they take that to their workplace where they know that if we were also monitoring the temperature here on this device and integrating that into our workflow, then we would be able to increase efficiency on this part of the manufacturing floor X degree. And so, it’s been cool to see users’ transition and grow like that. And that brings me into the future of the industrial data in Industry 4.0. So, we’re living in the age of Industry 4.0 where connectivity, automation, and data are reshaping the industries. And so, as the pool of available data exponentially grows and as industrial tech and connectivity rapidly evolves, so does the opportunity to improve and uncover value. This becomes extremely true when we think of things like generative AI and specifically things like time series LMs where they’re able to perform zero-stop forecasting extremely efficiently, especially if you give them or train them on industry-specific time series data sets. And I think that’s also one of the main things that is kind of where AI is lagging in the sense of there are a lot of publicly available text-based data sets, but there are less time series data sets.
ANAIS DOTIS-GEORGIOU: 14:31
And as those become more publicly available and as we start leveraging those in a collective way and applying them, that’s when we start to see some of these digital transformations and a shift in how we think about how things are run and how we can enable organizations to take advantage of time series data and leverage it in industry because you know there’s only so many ways that you can brew beer. And there’s only so many time series data sets about how beer gets brewed and what the data looks like coming out of that because it is bound by physical limitations in the system that are well understood, have clearly defined parameters, and generally produce similar data. So as all that data gets leveraged, starts to get leveraged together, we’ll see, I think, a big shift in how it’s used. So, kind of looking into the future, we see leaders in industries looking to build their own solutions, looking to leverage maybe more public data sets or data sets even across organizations, across organizational collaboration, to leverage things like ML and cloud computing so that they can digitally transform their business and improve operations and reduce some of these costs. So, for operational efficiency, specifically, we at Influx like to revolve everything around the challenges and pain of that particular industry segment.
ANAIS DOTIS-GEORGIOU: 16:07
And so, we really want to focus on real-time analytics, in-memory data cache, and that’s all based off Apache Arrow, which is our column or in-memory data format. And that also helps increase interoperability with analytics and streaming and alerting tools that also leverage Apache Arrow. So, I also want to take some time to talk about some of the typical users. So, we have operational technology engineers and OT site managers, IT architects, IT project managers, DevOps engineers, and software engineers. And so, some of the first challenges around kind of just the data volume, the dimensionality and resolution. And the outcome there is to perform real-time monitoring of all the operational technology machines, all the different devices on the factory floor, and create capabilities that map to this, specifically the purpose time series data, so that we can write really high volumes of unlimited cardinality and not have to worry about the schema of our database or even really the volume. The second challenge is usually around equipment uptime and performance. So, we want to also improve the quality control issues. And the outcome is to improve overall equipment effectiveness. And we want to reduce the downtime, increase output, and process optimization. And so that involves really being proactive about maintenance operations.
ANAIS DOTIS-GEORGIOU: 17:47
And here, InfluxDB is able to help there by providing capabilities like API integration. And that helps you automate predictive maintenance and other processes and analytics by providing really fast time insights and providing subsecond queries responses and allowing you to have interoperability with other tools through the API. And so, a third thing that I want to talk about is increased frequency and retention that leads to higher storage costs. So, the outcome is to lower those costs. And InfluxDB IOX or InfluxDB 3.0 really can leverage high degrees of compression and dictionary encoding to really reduce the storage costs and effort to query historical data as well. So especially with time series data, you want to do things like be able to pull the last record no matter when that last record was written in a particular database or table. You’re going to want to be able to perform really fast scans across your entire data to find things like global minimum and maximum. And so InfluxDB 3.0 is built with that in mind, largely because it is built on all columnar formats and also because we enable things like custom partitioning so that you can make sure that you can retrieve your data from historical data very quickly.
ANAIS DOTIS-GEORGIOU: 19:27
So that being said, I want to talk more about InfluxDB 3.0, the new database and the storage engine itself launched in 2023, and it forms the core of the new InfluxDB 3.0 platform itself. And I want to talk about how it solves some of these storage issues, some of these query issues, cardinality issues, and some of these ecosystem issues in terms of providing interoperability with other tools. So, first and foremost is to know that you can run InfluxDB on Cloud with Cloud Serverless, which is pay-as-you-go pricing and dedicated and cluster for just your own workload. So, we offer on-premises and clustered enterprise, which is basically our cluster enterprise evolution, as well as running it on edge. And OSS will also be available at the beginning of this year. So, with 3.0, we really did focus on columnar database for high performance and low storage. And columnar databases are specifically well-suited for time series data precisely because they enable really cheap compression. And that’s because when you’re monitoring something in the physical world, a lot of times you’re monitoring that environment with the expectation or hope that you are controlling a particular physical variable, whether that’s flow rate, concentration, pressure, temperature, etc. And one of the goals typically is either to maintain that pressure at a certain time or at a certain level for an extended period of time or with the knowledge that it will exhibit a particular controlled pattern.
ANAIS DOTIS-GEORGIOU: 21:04
And so that means that in practice, you’re going to be getting a lot of the same pressure or temperature values stored in your time series database as a result because you are controlling these values. So, when you get a lot of similar or identical values in a columnar database, that enables you to use things like dictionary encoding and really cheaply compress whatever, maybe 100 rows of the same temperature value. So, another thing about InfluxDB and columnar databases in general is that the way that InfluxDB works is that live data is going to land in the memory tier. And that’s built with Apache Arrow, which is that columnar format memory format. And this helps make queries for recent data very fast, in fact, like 10 times faster than V2. And then as data ages, that data is persisted into the Apache Parquet format. And this is another column nerve format that is known for really high compression. And furthermore, these are persisted in object store, such as Azure Blob Storage. So, you don’t have to choose between reducing your storage costs and retaining data for longer periods.
ANAIS DOTIS-GEORGIOU: 22:20
And lastly, in 3.0, we also eliminated cardinality constraints, and we partitioned by time. So, we use a catalog service to track statistics of Parquet files and doing optimized compactions on those parquet files. And so, sorting and cardinality is no longer a concern for InfluxDB 3.0. And by creating these catalogs, we can also leverage things like Iceberg. And then that allows you to read the Parquet files from any data warehouse that you might have so that you can easily access that data and use it with any other tools that you might have or any other data that you might have so that you can perform a full analysis. And so, in the example of if you were monitoring even an LLM, which is a completely different use case, you might use a document store for some of the actual text. You might use a vector database for the embeddings, but then you would use a time series database for all your usage metrics and resource metrics. And so, you’d want to be able to consolidate all that information in one place and being able to have this catalog service and leverage things like Iceberg enables you to do something like that. It also means that you can store all types of metrics. You can even store events and traces if you need alongside your time series data in one data store without having to worry about the performance and scale. So unlimited cardinality with InfluxDB allows customers to capture all the required metadata as tags for their sensors and devices without the limitations and improves efficiency.
ANAIS DOTIS-GEORGIOU: 24:07
So, in the past, if you were a previous InfluxDB2 user, you did have to worry about what you made as a tag versus a field because only tags were indexed. But that meant that tags were contributing to the dimensionality or cardinality of your data. And you had to worry about the scope of those tag values and whether they could explode and then create cardinality problems. But now you don’t have to worry about that anymore. We also have our API integrations. So, we have InfluxDB APIs that enable interoperability with other ML tools. So that enables customers to automate their predictive maintenance and in general improves their equipment effectiveness. Additionally, InfluxDB does persist data as Apache Parquet files with the help of Iceberg as well and allows interoperability with machine learning and data science tools. So many of them allow you to forecast on top of Parquet or leverage Parquet. Also, our in-memory columnar storage just says a refresher. It caches that recently ingested memory or data in that in-memory hot tier data as Apache Arrow. And with this hot tier, you can easily query your data and get things like global and local minimum from data within seconds.
ANAIS DOTIS-GEORGIOU: 25:36
So, it provides that really low latency analytical queries. Additionally, you’re able to persist data at a low-cost object store. And that means that that just lowers all your storage costs. And they can meet their long-term data retention requirements that way as well. There’s also zero retrieval cost compatibility. So unlike data historians where there is a huge effort to maybe load data from a historical archive and place it somewhere else. InfluxDB allows customers to make no additional effort to just query their historical data like any other data you have and even put it into a data warehouse and leverage it with other tools. I will say that Iceberg, that is a feature that is coming very shortly in the next couple of months. So that’s specifically leveraging Iceberg isn’t currently available, but it will be very soon. We also have schema on write capability. And so that just means that you don’t have to have a predefined schema to ingest your data. And that just really helps developer happiness. Because if you’re having to constantly consult a schema or you realize that your business needs change and now suddenly, your historical schema doesn’t work with your new needs, that just creates a whole bunch of different headaches.
ANAIS DOTIS-GEORGIOU: 26:58
So, by not having to define your schema beforehand, it just allows you to have that much more flexibility. Finally, edge to data replication makes it possible to securely write data at the edge and then replicate that data from the edge to a consolidated source like InfluxDB Cloud. And that helps just bring OT and IT closer together and eliminate some of those silos so that OT can focus on writing data to InfluxDB at the edge. And then you can replicate that data with a durable queue to InfluxDB Cloud. And then your IT organization can maybe do some more work with the analytics there and leverage other tools and build analytics solutions on top of it. We also have Telegraf. So, Telegraf is an open source collection agent for metrics and events. It’s configurable through a single Tamil configuration. You have the ability to also reduce the binary size so that if you want to run it on any Edge device, you can do so very easily or sidecar it. You can also do that very easily.
ANAIS DOTIS-GEORGIOU: 28:12
And Telegraf has input plugins for MQTT, for Open UA, and a bunch of different standard protocols for industrial IoT, which means that you can easily write data from a variety of different sensors, maybe even put it to an MQTT broker, publish it to an MQTT broker, read it from that MQTT broker with Telegraf. You could even use Telegraf to do things like perform forecasting on micro batches because there are also processor plugins in Telegraf, and there are plugins that make Telegraf extensible in any language. Those are called the ExecD plugins so that you can really easily take data from any source, process it in any way that you want, and write it to any data store of your choosing. So, Telegraf is a hugely powerful tool and a critical part of a data pipeline solution, especially in the industrial IoT. Speaking of the industrial IoT space and integrations, here are some of our partners and integrations that we have. We have integrations with Factory, with Grafana, with Superset, Tableau, etc. For data persistence, we have InfluxDB, both running at the edge, your data center, and cloud. For Middleware, we have Telegraf. We also have Kevlar integrations, HiveMQ, MQTT consumers, I mean, brokers, HighByte, Ni-Fi, etc.
ANAIS DOTIS-GEORGIOU: 29:46
And in this way, we assist PLCs, robotics, sensor devices. We get data from plants and factories, etc. And for platforms, we also have integrations with things like ThingWorx, IO-based, ignition, etc. And so last but not least, I want to take some time to go over some customer examples. So, the first one I want to talk about is Terga with InfluxDB— or Terega with InfluxDB and IO base. So Terega Solutions, they are creators of digital solutions that are used to improve energy efficiencies and address decarbonization challenges. So, a really cool company in the environmental engineering space. And they are creators of IO Base, which is a cloud-based IoT historian powered by InfluxDB. And they create this cloud-based digital twin for their clients to allow them to collect data from production sites and view it in real time. So, their first customer was Terega, who has a network of 5,000 plus kilometers of gas pipeline within France. And they aim to help France attain carbon neutrality by 2050. So, they do that with IO Base, and they’ve created IO Base to aid in the digital transformation of Terega’s data ecosystem.
ANAIS DOTIS-GEORGIOU: 31:13
So this kind of points to sort of a paradigm shift where we’re moving from data historians to InfluxDB with full cloud connectivity and security, no local IT and firewalls exceptions or software to install that just gives you agility and the ability to access your data in a simplified manner and perform all the operations that you might need, like monitoring your pipeline, any on-call actions that you need to take, analytics or reporting, as opposed to previously where there were issues with firewalls, with security issues, with bandwidth, with information silos, with engineering silos, and also vendor lock-in and issues with data being scattered and discrepancy between data from different sources. And just in general, impossible to maintain. And really just relying on particular experts who have been maintaining those historical data historians, excuse me. And so, I want to take a look at what IO-based looks like right now. Basically, IO Base and IndaBox can be used together, but they can be also operated independently but deployed together. And they basically provide a centralized master data with minimum on-site infrastructure to collect your data there. And it allows you to easily share your data and provides hardware and network agnostic capabilities so that you can leverage any type of hardware that you might need and be flexible and also offers that highest level of cybersecurity so that you don’t have to worry.
ANAIS DOTIS-GEORGIOU: 33:03
I mean, the fact that it’s being used for something like oil and gas pipelining with such a high risk at stake goes to really kind of exemplify how there is that shift in perspective between what your security needs are, what they look like, and how you address those issues and still provide solutions. The other thing I want to talk about is FTSE 500 Company. So, they use SCADA with OPC UA and Modbus, and then they use the Telegraf instances to pull data from Modbus with OPC UA. And they pull that into InfluxDB at the edge behind a firewall. And then they use InfluxDB, consolidated InfluxDB, and are able to pull data from the edge to InfluxDB to actually perform their platform monitoring in general. So specifically for aerospace companies, we’re looking at things like Smart Factory IoT use cases where we are building components for aerospace. And we’re pulling data from also MQTT brokers using the InfluxDB MQTT Telegraf plugin. And we’re then able to pull data into InfluxDB servers and visualize all our data with Grafana dashboards and create things like custom alerts and notification rules around a lot of those metrics so that we can correctly identify and alert the correct operations teams so they can respond to some of that data in real time.
ANAIS DOTIS-GEORGIOU: 34:54
And in this sense, too, when you build your data in this configuration with MQTT data being published to a Telegraf server, you have just a really scalable architecture here, especially if you leverage something like an MQTT broker in between Telegraf and your MQTT devices. So, with that being said, I want to invite you to get started using InfluxDB at influxdata.com. And if you want to learn more about how to use InfluxDB, I encourage you to take advantage of all the self-service content we have. We have a lot of blogs that show exactly how to get started with architectures exactly like this one. And I also want to encourage you to leverage the Influx Community org as well. I’ll share that with you right here. So, the Influx Community Org is a place that hosts Influx Community projects. And you can find one. For example, what is it called? You can search for whatever technology you want, but here’s a great example of one that you could get started with where it leverages it basically creates three robotic arms that are creating robotic arm data generated dummy data. That data is MQTT data. We then use HiveMQ to pull that data. And then we use Telegraf to take the data from HiveMQ. And then that data is being written to InfluxDB.
ANAIS DOTIS-GEORGIOU: 36:36
Quix is then pulling that data from InfluxDB. It’s acting as the data processing engine. It is performing some machine learning for anomaly detection on those robots to see when any one of those robots are malfunctioning. It is using transformers that have been trained in hugging face and pulling those weights down from hugging face to do the anomaly detection, writing any anomalies back to InfluxDB, and then we’re visualizing those anomalies and alerting on those anomalies with Grafana. And so that’s just one example of a variety of different demos that exist within Influx Community where there are full solutions being built here that you can run locally, but that could easily be scaled to accommodate the needs of an industrial IoT use case. So that’s another really good resource. Our documentation is also fantastic. And then we have InfluxDB University as well. There’s also currently undergoing kind of an overhaul for InfluxDBv3 so that we can provide a lot more Influx DBV3 content. But InfluxDBU is a place where you can get self-paced and live training on all things InfluxDB and even earn some badges if you want. Some InfluxDB community resources include Slack, our community forums.
ANAIS DOTIS-GEORGIOU: 37:54
I talked about our docs. I talked about InfluxDB University already. And some other InfluxDB. resources include some other demos that you can watch now through that link. Again, a copy of these presentation will be mailed available to you after. So, you’ll get access to all these links as well. So, I want to thank you so much for your time and take a moment now to address any questions that you might have. So let me find my Zoom box here so that I can see any questions that you have. And let me stop my share as well. So, we have some questions. Does Telegraf have a converter from OPC DA to UA? I know it has an OPC UA input plugin. I don’t know about conversion from DA to UA. So, I would need to check on that. But I can give you and give you this link here. Let me change to everyone. So, there’s a link there for the OPCUA plugin so that you can read more about it.
ANAIS DOTIS-GEORGIOU: 39:16
And we also have another question that says, you mentioned that with that InfluxDB architecture, the challenges of maintaining a firewall will be mitigated. However, in the FTSE customer example, you showed that they pulled the data behind a firewall. Please clarify. Thank you for that question. We have people doing it both ways, and I think that a lot of that has to do with comfortability and specifically the use case and the specific security concerns that exist within the context of each problem that people are trying to solve. There’s also government regulations and compliance that people have to address. So sometimes it’s not up to the individual customers and what makes you know sometimes some of those compliance concerns and requirements are outdated. So, it just depends on the individual use case. So, I’d say there. And how is InfluxDB different from AWS site-wise Pros and cons. So, you can also run InfluxDB. InfluxDB is offered through AWS as well, V2. So, there are some different pros and cons with just the versions there and costs and the support that you get. But they’re entirely different products as well.
ANAIS DOTIS-GEORGIOU: 40:46
And then you can also run clustered on-prem, but you could also manage it yourself. And then I think some of the advantages and disadvantages there really come down to the support and understanding of how you’d run that exactly. I have another question. How does a typical extract step work technically to offload data, for instance, to Snowflake? So, all the features with Iceberg are coming soon. So, I don’t have a specific example for that extract step right now, but that will be coming shortly. So, I would just ask primarily that you wait for that. Right now, otherwise, that’s going to look like leveraging the API and actually querying data and pulling it in and writing it. So, it’s going to look quite different when we have— when the Iceberg feature lands at the beginning of this year. But you can— for those asking that question, you might enjoy this webinar that you can access on demand that has more information about Iceberg and the upcoming release. And same for people asking for visibility on InfluxDB in Azure. Does the rise of time series DB mean the end of traditional plant historians?
ANAIS DOTIS-GEORGIOU: 42:20
I have trouble thinking that it’s going to be the end of plant historians. When you build some certain– there’s certain things within industrial IoT that are so well-defined from the way that they are modeled. I can think of continuous stirred tank reactors as a good example of that and some chemical reactions where they’re so well-defined and the math behind how they are modeled is so specific to the specific actual reactor itself and how it’s built and the model that it is that I have trouble seeing that they will be replaced anytime soon. But that being said, you could absolutely replace them. I guess I’ll put a plug-in for a training that I’m going to be doing this Thursday showing exactly how you could kind of create a digital twin yourself with InfluxDB, Kafka, and Faust Streaming, which is an open source project for data pipelining. And I do specifically create a digital twin for a continuous dirt tank reactor based off this repo here or this blog. One second. Oh, lost my Zoom chat. But it’s hard for me to imagine that they’ll be entirely replaced. I more see them right now. I think the first step is that they’re used alongside, although the IO base is an example where it was. So, I think it is possible.
ANAIS DOTIS-GEORGIOU: 44:19
But I think a lot of that industry moves so slowly because of various security requirements and because of the way that things have been modeled for so long and some of the control systems that already exist that are in place, that it takes a while for some of that overhaul to be done from a hardware perspective as well. So again, some of these product roadmap questions can be answered through that product roadmap update. But Iceberg integrations and OSS versions will be available at the very beginning of this coming year. So, there aren’t any built-in AI/ML capabilities in InfluxDB 3.x. What you can do is leverage tools like ByteWax, like Kafka and Faust, like Quix, like Mage AI, which is like an open-source alternative to Airflow. Like Apache Link, for example, you could leverage that. There’s a JDBC driver for that that I believe you could use really easily with InfluxDB v3. Anyways, there’s a variety of different stream processing tools that are out there. We have a lot of tutorials for how to use and leverage a wide variety of those, most of the ones that I just mentioned. And you could leverage those alongside with InfluxDB to do any sort of machine learning or anomaly detection that you want on top of that.
ANAIS DOTIS-GEORGIOU: 45:56
And I’d encourage you again to go to the Influx Community Organization, search for forecasting or machine learning, and you’d find examples for how to use a variety of those tools with InfluxDB. Someone asks, “Does it have interpolation capability?” I’m not sure what you’re referring to. But yes, the fact that we have Parquet in the backend and the ability to read your Parquet files directly with Iceberg coming very shortly will enable a lot of interoperability. Any other tools that are also leveraging Apache Arrow means and Arrow Fight as a result means that we have really fast ability to pull that data together and query that data so that you can have interoperability with any other tools that use Arrow Fight. And then let me check the actual Q&A as well to make sure I’m answering all your questions. So, the InfluxDB v3 Docker image will be available for use at the beginning of next year. I answered that one already. And then John Boyd asks— I don’t know why I just called you out. I never called anyone else out, but I called you out today.
ANAIS DOTIS-GEORGIOU: 47:35
He asks, “How about the Rockwell Automation InfluxDB as their new edge historian in factory talk view?” So yeah, you can also learn more about that here as well for anyone who wants to learn more about how to configure InfluxDB into that Info platform and learn about all the aspects of that connector as well. And let me go back to the chat to make sure. So, I don’t see any more questions coming in from the chat. I really appreciate all your questions today. Can it handle on-the-fly calculations? So, if you’re using Telegraf, you can do some— I’m assuming what you mean by that is some preprocessing before you’re writing into InfluxDB. You can also use SQL to query InfluxDB as well as Influx QL, which is an SQLite query language specific to InfluxDB. In v3, you can use SQL. So, you can perform some on-the-fly calculations using SQL when you query. You can also do some preprocessing with something like Telegraf. Like I mentioned, those ExecD plugins make Telegraf extensible in any language. Here’s an example of using InfluxDB to monitor a home brewing setup and perform some forecasting with exponential smoothing in micro batches before data is even sent to InfluxDB so that we can detect anomalies in the brewing data and write those anomalies to InfluxDB. So that’s another example.
ANAIS DOTIS-GEORGIOU: 49:36
And in general, I’m usually a fan of using statistical methods for forecasting when it comes to univariate time series and you are looking to perform short-term forecasts because those usually outperform machine learning methods because machine learning methods and neural nets in general, they typically do really well for multivariate, covariate time series data. And anyways, so there’s an example of that. That’s my little shameless plug or hot take on machine learning and time series. Is it possible to do linear regression in InfluxDB? So, I don’t think that there’s any linear regression SQL function. So, because Data Fusion is used as our query execution framework. And it enables us to query InfluxDB in SQL. You can always look at the reference architecture for Data Fusion because if anyone contributes any new function to Data Fusion, then InfluxDB will automatically have it. It’s part of InfluxDB being a part of this open data architecture, contributing to these upstream projects that other companies and individuals are also contributing to means we benefit from that larger contribution community.
ANAIS DOTIS-GEORGIOU: 51:05
And so, if anyone ever adds any sort of linear regression functions with SQL and they land in Data Fusion, that’s the first place that you’ll see them. And so, then you’d see them in InfluxDB as well. As currently in v3, I’m not aware of any way to do linear regression. Out of the box. Again, you can do it with Telegraf. You can use APIs to pull data out of InfluxDB. You can use JDP drivers to pull data out of InfluxDB and do linear regression that way. Here’s a blog post, for example, on how to leverage the driver for Influx TB in Tableau and use Tableau to produce some forecasts and linear regression. Yes, we are working on deletes. That’s all I have to say. Apologies. And people are very aware that there is a desire for specific delete capability with 3.0, and it is being prioritized. Again, I hate to do this, but all your questions about the product roadmap are better answered in the link I shared above.
ANAIS DOTIS-GEORGIOU: 52:33
But I really appreciate you asking so much about the product roadmap because it kind of points to an indication that maybe I should bug Gary to do another one of those webinars because it sounds like you are very interested in that. All right. Well, I don’t see any more questions coming through. So, with that, I really want to thank you so much for your time. If you do think of more questions, please feel free to reach out on the community Slack Forums. I’d be happy to answer your questions there. Thank you so much for joining us today. And again, a copy of this presentation will be made available to you shortly. So, thank you so much. Have a great day. Bye all.
[/et_pb_toggle]
Anais Dotis-Georgiou
Developer Advocate, InfluxData
Anais Dotis-Georgiou is a Developer Advocate for InfluxData with a passion for making data beautiful with the use of Data Analytics, AI, and Machine Learning. She takes the data that she collects, does a mix of research, exploration, and engineering to translate the data into something of function, value, and beauty. When she is not behind a screen, you can find her outside drawing, stretching, boarding, or chasing after a soccer ball.