InfluxData Blog - Jason Myers

Predictive Analytics Pipelines: Real-World AI, Predictive Maintenance, and Time Series Data

Jason Myers (InfluxData) — Tue, 23 Jul 2024 08:00:00 +0000

Real-world AI

There’s so much talk about AI these days that it seems we quickly forget that AI isn’t a single type of technology. It’s a category, almost an umbrella term for a wide range of different technologies, applications, and approaches. The terms “Generative AI” and “Machine Learning AI” (often referred to as “Real-World AI”) describe two different branches that fall under the broader AI heading.

Generative AI refers to models and algorithms designed to generate new content, such as text, images, music, or even code similar to the data used to train those models. However, we’re more concerned with real-world AI. This machine learning uses algorithms and statistical models that rely on patterns and inference to enable computers to perform specific tasks without explicit instructions. When we talk about creating smarter systems, real-world AI is the driving force behind them.

Industrial operations constantly strive for optimization and improvement. Even a 1% increase in efficiency can translate into millions of dollars in savings for a company. Real-world AI allows companies to use data to drive improvements and boost their bottom line. In the industrial sector, data-based predictions can have a huge impact. Let’s take a look at what goes into prediction processes and then consider a real-world example.

Predictive analytics

Predictive analytics lets organizations ‌foresee future events and outcomes. This allows businesses to enhance operational efficiency, mitigate risks, and secure a competitive advantage.

Understanding Predictive Analytics

Predictive analytics utilizes statistical models and data analysis to predict future events based on historical and current (in the case of time series data, real-time) data. This approach helps identify potential risks and opportunities, enabling proactive decision-making. Companies don’t need to rely on traditional methods based on ‘anec-data’ of past experiences or intuition.

Predictive analytics help companies across a wide range of verticals to streamline operations, reduce downtime, and allocate resources efficiently. For example, in manufacturing, predictive analytics can forecast equipment failures through the analysis of maintenance records and sensor data, allowing for timely maintenance and reduced operational disruptions.

Key Benefits of Predictive Analytics in Industrial Operations

When implemented and used correctly, predictive analytics provide significant advantages to organizations in the industrial and manufacturing sectors.

Reduced Downtime: Reactive maintenance, which is basically waiting for something to break, is extremely disruptive. By predicting equipment failures, organizations can schedule maintenance proactively. This ensures that they can take steps to minimize the impact of taking a production line offline, better coordinate service personnel, and enhance productivity overall.
Increased Operational Efficiency: Overall equipment effectiveness (OEE) is a key metric for industrial operators. The more efficient and consistent production machinery is, the better it is for companies. Predictive analytics identifies inefficiencies and bottlenecks in complex industrial processes. This allows operators to pinpoint trouble areas and make more informed decisions.
Improved Product Quality: Related to the previous point, the more consistent and reliable industrial equipment is, the more consistent their output will be. Real-time data analysis helps identify production anomalies, ensuring high-quality outputs. This reduces waste, and saves money.
Enhanced Inventory and Supply Chain Management: The ripple effect of predictive analytics impacts the supply chain as well. Companies can better understand and forecast the demand for raw materials. Optimizing inventory purchasing, storage, and management can reduce costs and improve service levels.

Predictive analytics in action

Having covered some of the key benefits of predictive analytics, let’s now quickly survey some of the techniques and approaches that companies use to achieve them.

As we move deeper into Industry 4.0, using AI becomes increasingly common. Predictive analytics in an industrial context involves using historical data, statistical algorithms, and machine learning techniques to predict future outcomes.

Industrial sensors and systems generate all kinds of data, and most of it is time series data (i.e., data with a timestamp). Tracking change over time is critical to establish baselines and understand how machinery functions compared to those baselines.

Time series data and machine learning are the foundation of predictive analytics. Let’s take a look at a real-world example of predictive analytics in the form of predictive maintenance. InfluxDB is a purpose-built time series database used to manage and process time series data.

LBBC Technologies

LBBC Technologies is the world’s leading designers and manufacturers of industrial autoclave technology. Aerospace customers use this equipment in the manufacture of high-performance castings, like turbine blades. LBBC provides support for customers all over the world. All LBBC equipment comes fitted with industrial gateways that simplify data connections between industrial PLCs and web services, like AWS. LBBC uses these technologies to offer ‘Connected Support’ and Web SCADA to its customers.

We can see real-world AI and time series data at work on the Core Leaching machines LBBC supplies to many aerospace manufacturers. The core leaching process uses potassium hydroxide, a dangerous and costly chemical. During the process, the potassium hydroxide slowly generates silicates that impair the machine’s ability to leach cores as they gradually build up.

High silica levels increase the likelihood of quality defects, but measuring silica levels and predicting when a customer will need to replace the potassium hydroxide is a technical challenge. Using InfluxDB to collect and visualize data from these core leaching machines, LBBC spotted a data pattern that allowed them to monitor potassium hydroxide conditions without complex, sensitive, and very expensive online analysis equipment or cumbersome laboratory tests.

Ultimately, what the data showed was that the build-up of silicates impacted the naturally low vapor pressure of strong hydroxides. Having identified this relationship, LBBC uses InfluxDB to process data to quantify hydroxide quality instead of making mechanical changes to the machinery itself, or investing in other detection methods.

Over the course of a year, LBBC collected data to use for their calculations, including process details about pressure, temperature, valves, pumps, and other important characteristics. The data points spanned 150 cycles with both fresh and spent potassium hydroxide and constituted over 1.3 million data points. Figure 1 shows ‌a data visualization for one of these cycles. The data processing algorithm isolates certain sets of data for each cycle and applies least-squares regression to generate a new variable called Resting Vapor Pressure (RVP).

Figure 1

Using this algorithm, LBBC confirmed that RVP increases as silicates build up and then falls when the customer refreshes the potassium hydroxide supply. Armed with the ability to generate this information from time series data, LBBC tracks RVP and proactively notifies customers when RVP reaches a level that indicates that changing potassium hydroxide will mitigate quality issues.

Figure 2

LBBC Technologies continues to refine their predictive maintenance approaches and best practices as they gain deeper expertise working with time series data.

To get started working with your time series data, try InfluxDB today.

Unified Namespace and InfluxDB: Streamlining IIoT Operations for Industry 4

Jason Myers (InfluxData) — Wed, 17 Jul 2024 08:00:00 +0000

The Industrial Internet of Things (IIoT) has revolutionized the way industries operate, enabling businesses to collect and analyze data from their operations in real-time. However, managing and analyzing data from diverse sources can be a challenge. While sensors and systems may use the same transport protocols, the shape and type of data generated can vary from one device to another. A lack of uniform, clean data creates challenges and obstacles when it comes to getting timely insights. This is where a unified namespace and InfluxDB come into play.

We’ll look at how a unified namespace can simplify data management, ensure seamless data flow, provide flexible data storage, and enable robust data visualization for IIoT operations, ultimately leading to improved efficiency, productivity, and profitability in Industry 4.0.

The power of a unified namespace: simplifying data management

Data is the lifeblood of industrial operations in Industry 4.0. A unified namespace offers a single, consistent way to organize and access data from various devices, sensors, and systems. As systems become increasingly complex and data silos become a greater impediment to progress, a unified namespace acts as a centralized data hub.

A time series database, like InfluxDB, is an ideal option for this data management layer because it can handle data collection and storage in a single place. Designed for unique time series workloads at any scale, InfluxDB eliminates the need for complex data transformations and mappings. It also saves time and resources and enables seamless data integration from different sources, making interoperability and collaboration across the entire industrial ecosystem easier.

A unified namespace also provides additional administrative advantages. For example, it can enhance data security and privacy by centralizing access control and governance. This means you can maintain strict control over who has access to sensitive data, minimizing the risk of unauthorized use or breaches.

It also provides the opportunity for more holistic data analysis and reporting. Storing all your data in a single data store enables unified views of data from multiple sources. This allows you to make connections and glean insights that would otherwise be inaccessible with fragmented and siloed data. As organizations strive to be data-driven, a unified namespace makes it easier to extract insights, identify trends, and make informed decisions that drive operational efficiency, productivity, and profitability.

Real-time data ingestion: ensuring seamless data flow

Real-time data ingestion is crucial for collecting data at the same speed that industrial sources generate it.

For applications that generate large volumes of data, such as sensor networks or IoT devices, Telegraf is a powerful tool for real-time data ingestion. Telegraf is a lightweight data collection agent that collects data from various sources. It can process that data and then output it to any desired data store, like InfluxDB. It has over 300 plugins, so it’s capable of handling virtually any type of data. Plus, it’s open source, so you can write your own custom plugins too. (Check out Telegraf Basics at InfluxDB University for more info on that.) But plugins for common IIoT/OT protocols already exist, e.g., MQTT, ModBus, OPC-UA, and more.

InfluxDB also supports client libraries in multiple languages to ingest ‌data directly into the database. So regardless of what you have generating data or how you want to transport it to your unified namespace data store, InfluxDB has a solution. Regardless of how you get it there, InfluxDB can ingest that data in real-time.

Flexible data storage: adapting to diverse industrial needs

Recency bias tends to be strong with time series data. As it ages, it becomes less useful, and some businesses purge it. However, the increasing desire to implement machine learning and predictive processes (e.g., analytics, maintenance, etc.) requires high-resolution, historical data.

This means that industrial operators that are leaning into digital transformation and evolving into an Industry 4.0 context need to factor data storage into their plans. Here, again, we see the benefit of a unified namespace architecture with a database like InfluxDB.

For short-term storage, you can utilize single-node edge instances. While these may have more limited resources compared to a central hub, they’re great for users at the edge who need that data locally for critical decisions and insights. You can then send that data to ‌a centralized instance using the edge data replication (EDR) feature. The nice thing here is that everything plays nicely together, and you only need to keep the most important data at the edge.

Having a central hub for storage and data analysis allows you to take advantage of InfluxDB’s data compression capabilities. First, the architecture of the database itself (a columnar data store) enables it to compress data efficiently. Second, it uses Apache Parquet as its persistence format, which also has high compression ratios and further compresses the data. So, you ultimately save more data in less space. The final element here is saving those Parquet files on low-cost object storage, which can save 90% or more on storage costs.

Organizations no longer need to compromise between getting deep insights and storage costs.

Robust data visualization: gaining insights from industrial data

Data analysis and visualization enable businesses to unlock valuable insights from their industrial data. By transforming raw data into visual representations, organizations can gain a deeper understanding of their operations, identify trends and patterns, and make data-driven decisions that drive success.

When it comes to data visualization, InfluxDB leans on integrations with best-in-breed solutions. The combination of Telegraf, InfluxDB, and Grafana is known as the TIG stack. Grafana is an open source data visualization tool with a native InfluxDB integration. InfluxDB also supports tools like Tableau and Apache Superset for data visualization.

Using InfluxDB as the backbone of your unified namespace approach means that you can use your data visualization tool of choice to power real-time data queries. Because all your data is in a single data store, you can query across dimensions and generate insights otherwise unavailable with legacy data historians and siloed data stores. The result is a holistic view of industrial operations and the ability to correlate optimizations and efficiencies across different areas of production to improve overall output.

InfluxDB also supports the integration of machine learning algorithms, allowing businesses to leverage advanced analytics for predictive maintenance, anomaly detection, and root-cause analysis. By applying machine learning techniques to industrial data, organizations can identify potential issues before they occur, minimize downtime, and ensure uninterrupted operations.

Next steps

The evolution to Industry 4.0 capabilities does not happen overnight. Industrial operators with large distributed systems that want to optimize productivity across their operations should consider the benefits that a unified namespace provides. Not only does it simplify data management by normalizing data from an array of sources, but once that data exists in a database, you can use and extend it in new ways because of the capabilities of a datastore like InfluxDB. At a time where real-time data collection and queries matter for large workloads of high-resolution data, a unified namespace can enable efficiencies and optimizations that give you a leg up on the competition.

InfluxDB 3 Product Update Round-Up: Q2 2024

Jason Myers (InfluxData) — Mon, 08 Jul 2024 08:00:00 +0000

Wrapping up another quarter provides an ideal time to look back on the features we rolled out across various products. Software is never finished, and our engineers have been working hard to deliver improvements to InfluxDB 3. This roundup highlights some of the developments and releases over the last few months.

Telegraf v 1.31

Telegraf is an open source server agent for collecting metrics and data from various sources, processing them, and sending them into a time series database like InfluxDB or other storage destinations. It’s lightweight and requires minimal configuration, making it highly efficient for gathering and reporting metrics in real-time.

The latest version of Telegraf introduced several new plugins, including a parser for Parquet and a range of new configuration options. You can check out the detailed release notes or download the latest version here.

Single Sign-On (SSO)

Single Sign-On is a user authentication process that allows individuals to access multiple applications or systems with just one set of credentials (such as a username and password). This technology simplifies the user experience by eliminating the need to remember and enter different service credentials. Enterprise environments often rely on SSO to enhance security and streamline user access by centrally managing authentication across various platforms. It not only boosts productivity by reducing password fatigue among users but also helps maintain higher security standards by enabling centralized control over user access and authentication.

We recently introduced SSO for InfluxDB Cloud Dedicated, our fully-managed, single-tenant cloud offering. Our team did a great job of streamlining the configuration process so customers using Cloud Dedicated can get their SSO up and running fast.

Check out the full product update and the accompanying documentation for more details.

Dedicated Dashboard

We heard from customers that they wanted to be able to monitor their InfluxDB Cloud Dedicated instances. So, we put some time and resources into delivering that observability for them. Now, any Cloud Dedicated customer can access information about their cluster using a Grafana dashboard. Having insight into cluster health and performance can help organizations better plan and allocate resources.

The full product update and documentation are available to learn more about this feature.

Stay tuned!

We have a bunch more features waiting in the wings, so keep an eye out for more product updates in the near future. Until then, check out the technical challenges our team is working on, like optimizing Parquet Bloom filters, data partitioning, and how to make queries run 100x faster.

Dealing with Mountains of IoT Data: An IIoT World Webinar Reflection

Jason Myers (InfluxData) — Wed, 03 Jul 2024 08:00:00 +0000

We’ve made the case many times that instrumentation is critical for understanding changes in the physical and virtual worlds. During this recent webinar, panelists discussed the challenges and opportunities of integrating IoT sensors into existing infrastructure, ensuring data quality and accuracy, and leveraging sensor data for operational efficiency and productivity.

Moving past manual processes

Despite spilling so much ink about digital transformations and Industry 4.0, many industrial and manufacturing processes remain manual. On one hand, this makes sense. These machines and systems are expensive, complex, and last for a long time. If it’s not broken, don’t fix it.

But what if you could make this equipment last even longer? Panelists discussed the advantages of instrumenting industrial environments, i.e., putting sensors on things. Industrial manufacturers can gain valuable insights by harnessing the vast amounts of data generated by IoT sensors.

Manufacturers often face challenges when adopting IoT technology. They often have apprehensions about new technology and the enormity of the data generated by IoT sensors. However, the panelists believe these challenges can be overcome by understanding the underlying problems technology can solve and by integrating this data into automated platforms.

As Balaji Palani, VP of Product Marketing at InfluxData, pointed out, a time series database, like InfluxDB, can effectively manage the increasing volume of IoT data generated by these sensors. This allows manufacturers to consolidate data from various sources and summarize it for easy analysis and decision-making. “Data is the currency. There is a lot of equipment out there on industrial floors and stuff like that where you want to collect that data,” he said.

Kelsey Hickock, Market Development Manager for Applied Industrial Technologies, further suggested that manufacturers must focus on specific pain points that IoT can help solve and start small with IoT implementation. This enables them to see the benefits of IoT, build confidence, and then scale up gradually. “Start really small, whether that be, if you end up wanting to monitor something, look at starting with one machine or one line and deploy one piece of hardware that could collect maybe five data points,” she advised.

Managing all that data

Industrial operators must be ready to manage the large volumes of data that IoT sensors generate. Data quality management is critical for actionable insights in Industry 4.0.

IoT sensors tend to be more rugged and durable than conventional sensors, making them ideal for industrial environments. They also offer better connectivity and easier integration with IoT platforms. However, ensuring data quality and managing the vast amount of data generated by these sensors can be challenging. Kelsey Hickock suggested that manufacturers can overcome these challenges by using automated tools for preprocessing the data, continuous monitoring, and leveraging the power of edge computing.

Balaji Palani explained the capabilities of InfluxDB in managing the immense quantity of time series data produced by IoT sensors. He also mentioned that it provides flexibility because of the option to include more details using tags. “The beauty about InfluxDB, again, not every tool can do this, but one of the things that InfluxDB does very well is you don’t need to define this data before you can start to collect your data,” he explained.

As for metadata and best practices, Palani noted that tags add context to data in InfluxDB, allowing users to create a more sophisticated understanding of the data. “All of these additional contexts: things like, which machine? Where is it? Is it coming from this sensor, this PLC? Any additional context under which circumstances this data is emitted can be appended to what we call tags,” he said.

InfluxDB is also able to collect data from multiple sources, consolidating data in a central hub and eliminating data silos. Some IoT sensors come with their own infrastructure and data hubs, but by leaning into APIs, open source tools, and standard protocols, industrial operators can build custom data pipelines to ensure the data they need gets to where it needs to be. A plugin-based, open source data collection tool like Telegraf can do a lot of this heavy lifting.

Turning data into intelligence

The panelists agreed that Industry 4.0 and IoT sensors offer manufacturers distinct advantages and business opportunities, including process optimization, real-time monitoring, and predictive maintenance. Palani pointed out that having a system where all of the data can be accessed and acted on uniformly can increase visibility throughout an organization, lower operation costs, and accelerate business value.

Kelsey Hickock noted that manufacturers that effectively harness and act upon sensor data can realize potential business opportunities such as optimizing production processes, reducing waste, improving quality, enhancing overall efficiency, and even generating new business models. “Leveraging your data to transition from selling products to offering services… manufacturers may pay based on the usage, as opposed to owning the equipment,” she suggested.

Getting started

Getting started with this kind of transformation can seem daunting. The speakers emphasized the importance of starting small and scaling gradually with IoT implementations. By focusing on specific pain points that IoT can help solve, manufacturers can see the immediate benefits, build confidence, and then scale up accordingly. This approach can help manufacturers overcome the initial hurdles and apprehensions towards adopting IoT technology.

For more information on InfluxDB and how it works in Industry 4.0, check out this e-book about modernizing data historians.

Watch the full webinar.

Overcoming Connectivity Issues in Distributed Systems: Aerospace

Jason Myers (InfluxData) — Fri, 28 Jun 2024 08:00:00 +0000

Maintenance and repairs for aerospace operations in orbit present a considerable challenge. It’s not easy to dispatch a technician to fix components on a satellite. That’s why it becomes increasingly critical to plan for as many scenarios as possible before launching and deploying these kinds of devices.

To understand what’s happening with orbiting devices, companies need data. If that data stream breaks down, the priority becomes reestablishing it to avoid a very expensive piece of equipment being rendered useless.

There are several different strategies businesses can use to help mitigate or overcome poor or spotty connectivity in distributed systems. Here, we’re going to assume there are any number of edge devices trying to send data to a central hub.

Pre-transmission

Before you try transmitting data, there are strategies you can implement locally to make those transmissions more efficient. Some of these will depend on the resources available on the device, but at the same time, the available resources should reflect the potential needs created by connectivity issues.

Data caching: This is especially useful if you need to use some of the data generated locally. Caching data on the device so it’s usable while the device is waiting to transmit it to the central hub can help keep your system operational.
Compression: This is pretty straightforward. Compress your data as much as possible to reduce the amount of throughput necessary to send the data to your central hub.
Priority scheduling: Is some data more important than other data? Knowing this ahead of time and prioritizing transmitting that data first ensures that it is more likely to reach the central hub in a limited connectivity window.
Checksums and hashes: If you have concerns about data corruption, you can generate checksums or hashes for that data locally. Including those checksums with the transmission helps the central hub verify the data.

Transmission

There are a lot of approaches you can use to build-in safeguards for your data. While this list is by no means exhaustive, hopefully it will help get the conversation started on your end.

Tooling

I’m breaking this section down further because the tooling you ultimately choose may impact your configuration options. So, let’s take a look at some tools that can help manage your data so that you can take advantage of different configuration options more easily.

InfluxDB: This should come as no surprise, but having a local time series database instance helps manage all the data from the various systems and sensors on your devices. In particular, a single-node instance that supports edge data replication (EDR) is ideal. This feature creates a durable, local data queue so that if your connection fails or gets interrupted, the database continues to collect that data and then flushes the queue once connectivity returns.

Kafka: Using Kafka queues is another way to combat intermittent or unreliable connectivity. Kafka queues function differently than a standard publish-subscribe. The queue system saves data in a queue, and once an application reads that data, it is removed. (In the publish-subscribe approach, data can be persisted so it isn’t purged after being read.) Multiple devices can publish to the same queue, which is helpful if the queue has a reliable connection. Like InfluxDB, Kafka queues scale horizontally very well and are good for distributed systems.

Configuration

Between InfluxDB and Kafka, you should be able to collect and store your data. Configuring application logic to work around connectivity issues is a whole different ball game. Here are some concepts and approaches to consider.

Dynamic Adjustment: This involves adapting transmission rates and methods based on current connectivity conditions. During periods of poor connectivity, you want this application logic to reduce the transmission rate or switch to more robust protocols for transmitting data.
Forward Error Correction (FEC): This puts the burden of verifying data on the receiver, so if your edge devices have limited resources, this is one workaround. This approach includes additional data in transmissions that allow the receiver to detect and correct errors without the need for retransmission. The approach mentioned above about generating checksums or hashes could fit into this bucket, although there are, no doubt, other options as well.
Edge Computing: This is where having a database at the edge is helpful. If you have the resources available, you can process data locally at the edge, sending only cleaned and processed data to the central hub. This minimizes the total amount of data you need to transmit.
Delay Tolerant Networking (DTN): This is a store-and-forward approach. DTN protocols are ideal for environments that experience long delays and disruptions. This is similar to a firefighter bucket brigade, where data is stored at intermediate nodes until a connection is available to forward it to the central hub.
Optimized Routing Algorithms: There are a couple of options that fall into this category.
- Opportunistic Routing: This involves writing your application logic to take advantage of any available communication opportunity to forward data. You can do this by choosing paths dynamically based on current network conditions.
- Multipath Routing: Instead of relying on a single data transmission path, consider configuring multiple paths to increase the chances of successful delivery. If you’re sending this data to a central InfluxDB instance, the database has automatic deduplication. This approach might end up resulting in more aggregate data transmission, so it might be best to keep this as a backup rather than a primary strategy for overcoming connectivity issues.
Protocols for Low-Bandwidth and High-Latency Networks
- Lightweight Protocols: Where it makes sense, you can use communication protocols designed for low-bandwidth environments, such as Constrained Application Protocol (CoAP) instead of HTTP.
- High-Latency Protocols: Similarly, lean into established protocols that can handle high latency, such as TCP variants optimized for long-distance communication.

Wrapping up

While it’s unlikely that any single configuration option is the silver bullet that aerospace companies are looking for, some combination of dynamic application logic and having the right tools to collect and manage your data can make serious inroads against unreliable or intermittant connectivity issues.

To learn more about how aerospace companies use InfluxDB, click here. Try out InfluxDB for free, here.

Scaling Data Collection: Solving Renewable Energy Challenges with InfluxDB

Jason Myers (InfluxData) — Thu, 06 Jun 2024 08:00:00 +0000

For data-critical and data-intense sectors, like energy and renewables, access to data can be a make-or-break situation. As the complexity of the systems underpinning energy operations increases, collecting and analyzing that data is more challenging than ever before. Therefore, understanding what data sources are necessary, where they sit in the tech stack, and how they scale across an organization ‌is crucial for obtaining the insights energy companies need to maintain and optimize operations.

Why scalability matters

When we think about scalability, it’s either horizontal or vertical. That’s not to say that companies need to choose one or the other; rather that their needs tend to surface independently.

The very structure of modern energy grids and sources requires much greater attention than older systems. Traditional energy production occurs in a central location, making tracking easier. The inputs and outputs are consistent and require less frequent attention.

Horizontal scaling

In the renewable energy space, the number and location of sources are often widely distributed. This means that energy producers need to monitor more ‘plants’ as well as the connections between those sources and the grid.

Because many renewable energy sources are intermittent, energy production and storage from individual devices (e.g., solar panels, wind turbines, etc.) must be monitored. The sun only shines during the day, and wind turbines only turn when the wind blows! Feeding that information into machine learning models can help optimize how companies use that energy. All this requires data.

Consider, too, all the residential or individual commercial facilities that generate energy and put it back into the grid (e.g., solar panels). Energy companies need to keep precise track of that information.

Vertical scaling

Finally, we need to factor in the systems doing the actual monitoring. Companies need to ensure that their operations technology (OT) stacks remain in good working order, too. Having a plan to watch the watchers is something companies need to consider. Monitoring these stacks requires vertical scalability as they grow and become more complex.

Data challenges

As systems become more complex and distributed, data workloads for energy companies become more demanding. The challenge with time series data is that there is a lot of it. A positive aspect is that the more data you have, the deeper insights you can reveal. When we instrument everything, we have insight into how those things operate and change. The more granular we get, e.g., microsecond or nanosecond precision, the more accurate those insights become.

However, deeper insights come with trade-offs. First, you need a solution that can process and manage high-resolution data. InfluxDB 3.x is a time series database that can ingest millions of data points every second and supports nanosecond data precision. In other words, it can ingest and make data available as fast as your equipment can generate it. (This is very different from legacy data historians.)

Storage becomes a challenge once you have high-resolution data because the more data you have, the more it costs to keep it. This is especially true where companies want to do forecasting and predictive analytics because these processes require as much high-resolution data as possible to build and train the AI/ML models that underpin them.

As a result, historically, companies could keep high-resolution data long-term, enable AI/ML optimization, and spend more to store it all. Or, they could downsample that data, keep the aggregation, and save money on storage, but drastically limit their ability to generate insights or optimize and embrace the advantages that Industry 4.0 offers.

Scaling for time series workloads

When we combine the realities of distributed industrial operations with the realities of data generation, it quickly becomes apparent why scalability matters. End-to-end monitoring can be a challenge, but failure to do so can lead to unpredictable and costly issues down the line. Furthermore, when we look at the energy sector as a whole, both vertical and horizontal scalability emerge in different ways. Some companies may play in all areas, and others may choose to specialize. However, opportunities in this sector are often a function of the need for scalability.

Data normalization and edge data replication

On the energy production side, there is no shortage of devices and systems to monitor. This is especially true for renewable energy sources like wind and solar. Energy-generating devices often use different sensors, even within the same array, which can mean they push out data in different formats and protocols. To make sense of that data holistically, companies need to normalize it.

In the diagram below, you can see Telegraf used to collect data from various protocols (e.g., Modbus, MQTT, OPC-UA, etc.) and output it in InfluxDB’s line protocol. You can accomplish this using a single-node instance of InfluxDB at the edge. Individuals working onsite can use that data at the edge to monitor local systems in real-time. These edge nodes use InfluxDB’s edge data replication (EDR) feature to create a durable queue that automatically sends data to a centralized data store. This architecture enables both data access and analysis at the edge and couples it with data resiliency at the center.

Architecture diagram showing data ingest with Telegraf, single-node InfluxDB, and EDR enabled at the edge, transmitting data to a central hub.

Energy distribution

There are many ways to get energy from its source to its final destination and plenty of things to monitor along the way.

In a more traditional power grid, we can see both horizontal and vertical scaling realities. In the diagram below we have two distributed power station networks that make up one vertical layer and a horizontal layer. We see the same thing with the substations fed by the power stations in the previous layer. You can use InfluxDB within each plant facility to collect, store, and analyze its data. The goal here is typically to understand faults when and where they occur to accelerate maintenance. These aren’t self-repairing facilities, so real-time insights may not be necessary. However, energy companies still want to identify root causes as quickly as possible to optimize schedules for their maintenance staff. With enough data, energy companies can leverage machine learning to predict when errors will occur and be more proactive in troubleshooting them. The name of the game is monitor, analyze, predict, and repeat.

Virtual power plants

Storage and strategic release of energy are the key motions behind virtual power plants. The cost of a kilowatt/hour of energy varies by time of day. Periods of high energy usage can cost orders of magnitude more than periods of low usage. Companies can take advantage of this situation by storing energy produced during the low periods, storing it, and releasing it during the high periods.

What does this look like in practice, and where does data fit into the story? The energy generation source in this situation is usually renewable, like wind. Companies typically store energy in battery arrays. Therefore, they need to monitor not only the wind turbines but also the batteries, assuming they control those. They need to be able to track battery performance on both ends of the process. That means understanding battery capacity as energy comes in and when it goes out. They also need to track the price of energy throughout the day so that they can maximize the cost of stored energy.

Ju:niz Energy is an example of a company that uses batteries to store energy. Ju:niz developed intelligent, large-scale energy storage systems that collect 1.3M data points every second about battery health, climate, temperature, and other conditions. Ju:niz uses the Modbus protocol to connect to the iEMS SPS controllers on site. They collect the data from the controller with open source Telegraf, the data collection agent for InfluxDB, and write it to an open source instance of InfluxDB at the edge. Ju:niz sends data from all its local InfluxDB OSS instances to a central, AWS-hosted, Cloud Dedicated cluster using EDR. To learn more about this type of energy storage, check out the complete case study on ju:niz Energy to understand how it works and where InfluxDB sits in the system.

Ju:niz Energy architecture diagram

Companies working in this area can also control the energy that consumers with solar panels put back into the grid. In some areas, companies can buy the energy generated by individual households. Companies can offer higher rates than public entities to purchase power generated by individual households in the same way that they can maximize value by strategically releasing energy into the grid. These companies need to monitor the amount of energy coming into the system through these agreements.

Monitoring energy at scale

This post barely scratches the surface regarding data, scalability, and the energy sector. But even this brief demonstrates that a need to use data to monitor, analyze, and predict exists along both vertical and horizontal axes. With so many sources generating data at so many levels, the ability to collect, organize, and manage that data–to turn it into actionable insights–becomes mission-critical. InfluxDB has the capabilities and features to ensure energy companies can see what their systems are doing, derive deep insights from data, and power advanced analytics (AI/ML) to optimize and improve systems and processes up and down the sector.

Click here to learn more about InfluxDB and how it works with energy/renewables.

What to Expect When You’re Expecting InfluxDB: A Guide

Jason Myers (InfluxData) — Tue, 14 May 2024 08:00:00 +0000

Well, you’ve done it. You decided to take the plunge with InfluxDB. While vast and diverse possibilities await, you may have more short-term concerns. Namely: now what?

Getting started looks different for everyone because no two users are doing the exact same thing. This post is primarily aimed at InfluxDB Cloud Dedicated and InfluxDB Clustered users (or any other products that include support agreements. You can chat with one of our sales folks if you have questions about that).

Our aim is to equip you with best practices and a clear set of expectations from the get-go. For those of us who like to read the instructions before we start putting that IKEA furniture together, this will make sense. If you’re the type to step up to the plate with a bandolier of hex wrenches and a disdain for instructions, this is one project where you may want to take a more cautious approach.

The goal here is to reduce (or eliminate!) headaches or issues when using InfluxDB.

Before you start

One of the most important things to figure out is to make sure you get the correct product for your workload. InfluxDB has a range of options to fit workloads of any size. While the following tips are true for all users, they’re critical for those with large workloads, which is why we’re focused on InfluxDB Cloud Dedicated and InfluxDB Clustered.

Schema design

Yes, InfluxDB is a schema-on-write database, which is really useful for workloads that change shape. However, InfluxDB also allows users to design their schema. Being able to map data to tags or fields can optimize your data collection and analysis. Here are a few things to keep in mind when you’re designing your schema.

Number of columns: Currently, an InfluxDB 3.x measurement supports a maximum of 250 columns. One is reserved for timestamps, giving you 249 columns for tags and fields. That’s a lot of columns! (If you need a refresher on InfluxDB’s line protocol data model, go here.) If you need more than 249 columns, you might think about narrowing your schema before you start writing a lot of data.
- For example, instead of putting all the data from one plant into a single measurement, perhaps create a measurement for each machine in the plant and then roll those up to a single dashboard. That way, you can collect higher-resolution data on each device without running out of columns for your entire plant.

In the left panel, multiple machines send data to the same measurement, increasing the number of rows necessary. In the right panel, each machine has its own measurement.

Data types: Make sure you know what type of data your device(s) produce. Different components of InfluxDB line protocol accept different data types. For example, tag keys and values, along with field keys, must be unquoted strings, while field values can be quoted strings, floats, integers, unsigned integers, and booleans.
Tags: Tags are metadata that contextualize your data. Tags aren’t required but we strongly recommend using them. For example, if you have robotic arms in multiple plant locations, you would use tags to designate which robot is in what facility.
Fields: As mentioned above, you have a lot of flexibility when it comes to field data types. With that said, you need to be careful to avoid field-type conflicts. These occur when you have a field mapped to a data type but try to add a different type to the same field. InfluxDB is schema-on-write, and any undesignated numeric values are parsed by default as floats. Some client libraries may inherit data types from their own typing system and assign them to the field. However, you can explicitly designate file types in raw line protocol or through your client library of choice.
- If you want to write a numeric value as an integer, you have to include the ‘i’ suffix to the value. So value=“96.0” becomes value=“96i.” If you need to change field data types, you can use Telegraf and the Converter processor plugin to convert data types. (Be sure to read the documentation to ensure the plugin fits your use case.)
Professional Services: Customers with support contracts can also leverage InfluxDB’s professional services team to get help with schema design and best practices. Remember, it’s better to take advantage of this before you start writing data to prevent additional work down the line. Check with your account manager for more information on this option.

Partitioning

When you query data, the amount of data the database needs to sift through to find what you need impacts the query response time. The more data the query needs to go through, the longer it takes. By default, InfluxDB 3.x partitions data by day and persists that data as Apache Parquet files. Partitioning your data splits it into smaller, logical groups so that queries can target smaller data sets and return results faster.

The best way to partition your data will depend on your data and what kinds of queries you want to run against it. To return to our temperature example, if you’re collecting data from across the country, you might want to partition the data by city, state, region, day, or month. Just be sure you don’t create too many partitions because that can also impact query time. As with anything database related, you will need to find the balance and weigh the tradeoffs between query response time and storage.

Check out the partitioning best practices in our docs for more information.

After you start

For users that have support agreements, like those using InfluxDB Cloud Dedicated and InfluxDB Clustered, once you’re ready to get going, the InfluxDB support team will reach out directly to your organization to make sure all your licenses are in order and in the case of Dedicated, to provision your cluster. At this time, our team will also ensure that we know who from your organization should have access and get them set up properly. We’ll also schedule a call with one of our Support Engineers to begin the onboarding process. During this initial call, we can help with any schema or partition checks at that time.

Following the initial onboarding, we will proactively schedule a system health check-in. We conduct these on a quarterly basis. The health check-in is an opportunity for us to analyze and understand your production environment(s) and to make sure that we understand ‌how your organization defines success so that we can help you achieve it. This may include discussions around utilization and potential growth opportunities, queries and query optimization, outages, resizing, any other best practices that may help your use case, and a review of any issues or support tickets your organization filed since the last check-in.

At the end of your check-in, we will schedule the next one so that we have a consistent line of communication and ensure you always have a venue to surface questions or issues.

Final thoughts

Plenty of resources exist to help you get the most out of InfluxDB. For those with support contracts, it’s helpful to understand what you can expect from that agreement. Support can be a big difference-maker for some organizations. With regular system health checks, our support team works diligently so you have everything you need to reach your goals.

Learn more about InfluxDB 3.x here.

Infrastructure Monitoring Basics: Getting Started with Telegraf, InfluxDB, and Grafana

Jason Myers (InfluxData) — Fri, 05 Apr 2024 08:00:00 +0000

Ensuring the reliability and performance of applications and systems is vital to a healthy infrastructure. With the exponential growth of data, traditional monitoring approaches fall short of providing real-time insights and proactive problem-solving. That’s where InfluxDB comes into play, offering a robust and scalable solution for all your monitoring needs.

The webinar covers the basics of infrastructure monitoring using the TIG stack, Telegraf, InfluxDB, and Grafana. The session covers a range of topics, including the difference between monitoring and observability, using Telegraf for data collection, InfluxDB for data storage, and Grafana for visualizing and acting on data. To illustrate these concepts, Developer Advocate Anais Dotis-Georgiou presents hypothetical problems and demonstrates how to use the mentioned tools to solve them.

Highlights

1. The combination of Telegraf, InfluxDB, and Grafana enables comprehensive infrastructure monitoring

Anais discussed how to use these three tools for effective infrastructure monitoring. Telegraf, an open source agent, is used for data collection. “Telegraf is our open source plug-in, open source agent for collecting metrics and events. It’s plug-in-driven and has over 12,000 stars on GitHub,” she noted.

InfluxDB, a time series database, is the storage component. InfluxDB 3 is built on the FDAP stack, which includes Apache Flight, DataFusion, Arrow, and Parquet, enabling it to handle high volumes of time series data. These open source technologies also make it easy to extend and integrate InfluxDB with other tools and systems. As a result, InfluxDB 3 offers more interoperability, allowing developers to use a variety of Python libraries and other tools for ETL.

The final tool is Grafana, primarily used for data visualization and alerting. “We use Grafana as the observability hub… and we can use both the Flight SQL plugin or the official InfluxDB v3 plugin and the Jaeger data source to query data from InfluxDB 3, where we consolidated all of our logs, traces, and events and metrics,” she elaborated. “Grafana and InfluxDB have a really great and longstanding relationship. It’s the main visualization tool that we expect our users to use with InfluxDB.”

2. InfluxDB 3 brings significant improvements for handling time series data

InfluxDB is designed to handle time series data, and the new version, InfluxDB 3, brings notable improvements. Enhanced storage and compression allow users to work with and store large volumes of data using less space. It also ingests data faster than previous versions and allows users to query that data in real-time using SQL.

Anais further highlighted how InfluxDB 3 allows developers to ingest logs, traces, and events in addition to metrics. Flexibility and the database’s ability to handle high volumes of data makes it an ideal solution for IoT, analytics, and cloud-native services.

“We have both cloud and edge-based offerings for InfluxDB. And so, there might be some use cases where a user wants to keep their data closer to their source. So what they might do is downsample and aggregate their data before writing their data to a more globally visible store,” she noted.

3. The versatility of Telegraf makes it a valuable tool for data collection

Telegraf, an open source data collection agent, is highly versatile due to its plug-in-driven nature. Anais described how it supports over 300 plugins for ingesting and outputting data, making it one of the most adaptable ingest agents for time series data.

“Telegraf is our open source agent for collecting metrics and events. It’s plug-in-driven and has over 12,000 stars on GitHub,” she stated. Anais emphasized how community-driven Telegraf is, with the majority of the plugins contributed by the community.

She also explained how Telegraf could be configured through a single file and used a variety of flags to test configurations before committing to them. This ease of configuration and testing makes Telegraf a user-friendly tool for data collection.

The example in the webinar bore this out. As Anais explained, “We used Telegraf as our collection backbone. We deployed it on all our servers and cloud infrastructure to collect OpenTelemetry data, Prometheus, and CloudWatch data, as well as raw server-based metrics.”

Next steps

When it comes to infrastructure monitoring, time series data is critical. And when it comes to time series data, InfluxDB is the solution that gives you total control over your data and lets you do more with it.

To try out InfluxDB for yourself, sign up for a free account today.

Data Historians vs. Time Series Databases

Jason Myers (InfluxData) — Wed, 13 Mar 2024 08:00:00 +0000

It’s easy to pitch technology buying decisions as black or white, where one camp is the promised land and the other is a dystopian wasteland where companies and profits go to die. But that doesn’t match reality.

Instead, organizations need to balance technical trade-offs with their needs. So, while it’s easy to stand atop the “rip and replace” mountain and shout the virtues of your new technology, that’s not something that most organizations are willing to do.

In the industrial and manufacturing space, data historians were a key element of Industry 3.0, where computers took center stage. The fact remains that progress from Industry 3.0 to Industry 4.0 is incremental. Some companies may move from Industry 3.0 to 3.5 to 3.7 before landing in Industry 4.0. As technology evolves and organizations embrace Industry 4.0, how must they adapt? These incremental changes look different for every organization.

Before making any grand decisions, it’s important to understand the players in the game. For the purposes of this article, we’re talking about InfluxDB, a purpose-built time series database, and legacy data historians.

	InfluxDB (TSDB)	Data Historian
Domain-specific	No	Yes
Open/Closed system	Open source	Closed (Proprietary)
Deployment environment	Cloud, Edge, On-Prem	On-Prem
Interoperability	Extensive (Open source, APIs, cloud-native)	Limited
Build/Buy	Build	Buy
Scalability	High	Limited
OT integration	Supports common protocols, customizable	Tight
End-to-end solution	Not out of the box, but you can build one	Yes
Growth potential	Unlimited	Limited by vendor resources and goals

Data Historian: Pros

Data historians aren’t all bad. Technology doesn’t become commonplace in ‌a given sector—like data historians—if it doesn’t work well.

Domain-specific: Vendors build data historians for industry and industrial applications. These systems focus on the unique features and needs of industrial environments and provide tools that work with PLCs, SCADAs, individual machines, and more.
OT integrations: Data historians tightly integrate with operations technology (OT) control systems and standards.
End-to-End: Data historians have features for pretty much any requirement that industrial operators need. Newer data historians even offer rich UIs to visualize data. Data historians are more of an “all inclusive” option, providing a wide range of features and capabilities in a single solution. They may not be turn-key, but they’re much closer to that than DIY.

Data Historian: Cons

While those pros all sound pretty good (that’s why they’re positives), we have to remember the context of this consideration is the move to Industry 4.0. Data historians are great self-contained solutions, but what happens when you want to do more with your data?

Legacy tech: Closed software systems create “walled gardens” that limit organizations’ ability to adapt, innovate, and grow. Data historians are a tool designed for one job. But if multiple people need to use the same data for different purposes? This leads to the next point…
Vendor lock-in: The walled garden makes it almost impossible to integrate with modern data ecosystems. Closed, proprietary software makes it difficult, if not impossible, to integrate with tools beyond what the vendor is willing to support. As other systems advance, new protocols and standards emerge, and vendors prioritize interoperability, a closed system limits what you can do with your data.
Data silos: These systems typically run on-premises, so when your data historian can’t connect to other systems, the data in the historian becomes siloed. Some organizations may even have multiple data historians running at a single site. The closed nature of these systems prevents users from collating data and drawing insights across systems— another factor that limits your ability to generate value from your data. If other systems can’t access your data, then they can’t benefit from it either.
Cost: Because they’re such niche systems, data historians tend to be expensive. Custom changes, if available, are expensive and time-consuming because vendors are committed to their proprietary standards and have limited development resources.

InfluxDB: Pros

Open technology: InfluxDB is built on open technologies, allowing for a lot of flexibility in application development. Leveraging modern technologies and open standards gives users access to best-in-class services and tools. These capabilities enable teams to adapt, grow, develop, and iterate applications faster. Access to a larger ecosystem, APIs, connectors, widely adopted industrial protocols, and third-party tooling lets developers choose the tools they prefer and integrate them with InfluxDB. Accessibility and interoperability are critical components of Industry 4.0, and that’s where a TSDB like InfluxDB shines.
Query languages: InfluxDB supports SQL and the SQL-like InfluxQL query languages. SQL is basically the lingua franca of the digital age, reducing start-up time for many users. Having multiple ways to query data provides another degree of flexibility to users. InfluxDB also supports client libraries in multiple languages to make writing data easier.
Scalability: There are a couple of aspects of scalability worth mentioning. First, there are the database resources and infrastructure. As a cloud-native database, InfluxDB is available as a fully-managed service. Residing in a major cloud environment enables InfluxDB to scale up and down with users’ needs. Another aspect is its ability to scale to meet growing data ingest needs. InfluxDB is designed to handle large time series datasets without impacting performance.
Lower storage costs: This is especially pertinent to industrial organizations that want to take advantage of predictive analytics and other advanced analytics and artificial intelligence tools. These organizations need highly granular, historic data to feed and continually train AI and machine learning algorithms. Storing that data can be expensive, which is why InfluxDB separates compute and storage and utilizes multiple storage tiers. Cold storage, for infrequently accessed data, lives on low-cost object store and can reduce storage costs by 90%+.
Multiple deployment options: InfluxDB is cloud-native, but it is also available for on-premises deployment for those users who want or need to control their infrastructure. Single-node instances for edge deployment also allow organizations to bring data collection and processing closer to data sources, which can then replicate that data back to a centralized instance, if desired.

InfluxDB: Cons

No solution does everything for everyone all the time, and databases are no exception. That’s precisely why specialty databases exist.

Not domain-specific: InfluxDB isn’t built specifically for industrial or manufacturing applications. It doesn’t have the features that data historians do baked-in and ready to go from the outset. Adding those things would take additional time and effort.
Build vs buy: When you opt for a time series database like InfluxDB, you know going in that you’ll need to build some stuff. This requires domain knowledge (or at least the time/willingness to learn) and developer resources.
Stack needed: Related to the build/buy idea, because InfluxDB isn’t domain-specific, building an end-to-end solution requires using a wider ecosystem to get comparable features to a data historian. Some organizations don’t want—or have—the resources to learn or manage an entire ecosystem.

Deployment: crawl, walk, run options

To get back to our initial idea, when choosing data historians and/or time series databases, you should consider your organization’s needs and what solutions best fit them.

The following examples are just that: examples. Every organization will have different needs and require different trade-offs, but hopefully these examples will provide a jumping-off point for thinking about the relationship between your data historian, your needs, and where a time series database fits into the picture.

Crawl

Let’s say your data historian works fine for your organization, but you are curious about digital transformation. No doubt you’ve put years and tons of money into your OT stack, so you’re not going to go around pulling plugs for an experiment.

One approach you might take is to use Telegraf to test data collection. Essentially, you can configure Telegraf to collect data from the same sources your data historian collects from. You would want to start with a small data set to keep storage costs down because you’d be writing the same data to two different places.

But doing this gives you a sense of where to locate InfluxDB in your OT stack. And it yields legitimate production data to experiment with to see what kind of insights you can gain.

Walk

Let’s take this example to the next level. Remember when we mentioned that a plant may have multiple siloed historians running at the same location? Well, replicating the above experiment for each historian (but outputting the data of each Telegraf instance to a single instance of InfluxDB) allows you to combine those data streams and break down those silos.

Using a visualization tool like Grafana allows you to create a single pane of glass to track the individual performance of each system as well as their collective performance.

Run

Once you get a feel for how InfluxDB can function with your data historian on a smaller level, you can build out that integration. Connect more systems and tools to InfluxDB. Investigate other ecosystem tools that you can use to replace data historian features. One benefit of open technologies is that you can customize these replacements to meet your specific needs.

If you’re expanding operations and your TSDB experiments are going well, it may be time to adopt a TSDB instead of an expensive legacy data historian. But the point is that you can ease your way into open standards and a time series database. It doesn’t have to be all rip-and-replace. You want to ensure the technologies you use meet your needs, so you need to make sure that the trade-offs from a data historian to a TSDB make sense.

That said, organizations that are serious about digital transformation will likely be in a position—sooner rather than later—where they need the connectivity, interoperability, and accessibility of open standards to remain competitive. Legacy technologies are legacy for a reason. But fortunately, future-proofing your systems doesn’t have to take place overnight. It’s ok to be at a 3.6 on the Industry 3.0 to 4.0 transition spectrum. Options exist. You just need to determine what trade-offs, e.g., costs, features, capabilities, etc., are acceptable for your organization and plan accordingly.

To start experimenting with these ideas and InfluxDB, sign up for a free account today.

Additional resources: