InfluxData Blog - Jessica Wachtel

Introducing the Time Series Buying Guide for IIoT

Jessica Wachtel (InfluxData) — Wed, 29 Jan 2025 07:00:00 +0000

All machinery and equipment, including their controls and sensors, tell a story through the data they collect. This data, or Industrial Internet of Things (IIoT) data, provides a detailed narrative about the machines, offering actionable insights to improve operations. IIoT data empowers businesses to optimize and enhance industrial processes by detailing operational status, performance metrics, usage patterns, health diagnostics, and environmental conditions.

The value of leveraging IIoT data

Harnessing IIoT data can significantly improve operational and business efficiency. When fully utilized, it transforms raw information into tangible benefits, such as:

Real-time anomaly detection and intervention
Maximized productivity, minimized waste and downtime
Reduced unplanned outages and improved maintenance forecasting

Extracting the full value from machines, equipment, industrial controls, and sensor data drives increased revenue by reducing outages, optimizing processes, and lowering error rates. However, achieving these benefits requires tools that preserve and enhance the quality of datasets, ensuring they remain actionable and accurate.

The challenges of managing IIoT data

IIoT data captures changes over time, from subtle fluctuations to catastrophic shifts. This temporal context makes it “time series” data—a sequence of data points collected or recorded at regular intervals. Managing time series data poses unique challenges due to its scale, speed, and complexity:

Massive scale: Continuous high-speed, high-volume data streams
Real-time action: The need for immediate analysis and response within data streams
Data cardinality: High numbers of tags collected result in high cardinality, which can strain system performance

Businesses can easily generate billions of IIoT time series data points per second, demanding ingestion and storage solutions that keep pace with real-time analytics.

Why high-fidelity data matters

Successfully managing these challenges results in high-fidelity data—complete and accurate datasets that provide precise insights. High-fidelity data enables businesses to detect issues early, build accurate analytics, and create reliable models.

Conversely, failing to address these challenges results in incomplete datasets, which lack the necessary precision to capture early-stage anomalies or build actionable insights.

Safeguard and maximize value from your data

“Time Series Buying Guide for IIoT” explores how manufacturers can identify the tools and architectural practices necessary to achieve high-fidelity data sets. Readers will gain insight into the unique characteristics of IIoT time series data compared to other data types. The guide highlights which tools can maximize the value of IIoT data and warns against those that may appear beneficial but prove detrimental over time. Additionally, it offers practical advice on seamlessly integrating specialized IIoT tools into existing architectures and environments, eliminating the need for costly, large-scale system overhauls.

Download the “Time Series Buying Guide for IIoT” now.

How Does InfluxDB 3 Query Data in Real-Time?

Jessica Wachtel (InfluxData) — Wed, 22 Jan 2025 07:00:00 +0000

InfluxDB 3 builds on open-source technologies—Flight, DataFusion, Arrow, and Parquet—but even if a developer made their own time series database using the same technologies, they would not be able to replicate InfluxDB 3. The FDAP stack provides many of the building blocks required for a high-performance database, such as the fast, multi-threaded, streaming, columnar execution engine that defines InfluxDB 3. However, it does not include all the low latency and time series specializations used in InfluxDB 3. The specialized parts include a custom ingester and compactor to quickly organize incoming data for querying, optimized file organization for time series, and specialized caches that allow users to see and analyze data in real-time.

Real-time analytics

What is real-time analytics, and why is it so important for time series data? Real-time analytics for time series refers to the process of analyzing and extracting insights from time-stamped data immediately after it is ingested, with minimal latency. It empowers quick decision-making through immediate trend, anomaly, and event detection. Real-time analytics enables organizations to monitor and respond to events as they happen. Any latency during data ingestion will increase the time between incident and intervention, staking time to a process where milliseconds matter.

Real-time analytics and InfluxDB 3

InfluxDB 3’s high-performance analytics are supported by a combination of the custom ingester, compactor, catalog, caching and custom file organization that underpin the database. These three components work in tandem to deliver speed and efficiency.

Before diving into the ingester, let’s discuss the technology. As mentioned earlier, InfluxDB 3 is built on the FDAP stack—Flight, DataFusion, Arrow, and Parquet—which significantly enhances query performance. In addition to the custom engineering done by InfluxDB’s engineering teams, these technologies form a strong foundation for fast querying of historical data. The custom engineering includes upstream contributions to DataFusion itself and InfluxDB 3.

What is the FDAP stack?

The Foundation: DataFusion, the Query Engine

DataFusion is an open source query execution framework written in Rust that efficiently processes large-scale data. It offers a modern SQL interface for querying data, supports multiple data sources, and includes features like distributed query execution and vectorized processing for high performance. As part of the Apache Arrow ecosystem, DataFusion integrates seamlessly with Arrow’s in-memory columnar format and works natively with Parquet files. This compatibility enables fast and efficient analytics on large datasets stored in columnar formats and makes DataFusion ideal for analytics, ETL pipelines, and big data processing.

DataFusion uses Arrow as its in-memory format and can read Parquet, along with many other file formats. DataFusion is considered a high-level library, while Arrow and Parquet are lower-level libraries that provide more control over data storage and system performance. High-level libraries offer simplified interfaces to perform specific tasks, abstracting away complexity, while low-level libraries allow more detailed customization at the cost of requiring more technical expertise. Part of DataFusion’s speed comes from building on top of the foundation that Arrow and Parquet provide.

InfluxData’s contribution to DataFusion

Although DataFusion supports various data sources, it wasn’t originally designed with time series data in mind. When InfluxDB adopted DataFusion, InfluxData engineers played a crucial role in adding the relevant extension APIs needed to build an optimized query engine for time series data. These upstream contributions have significantly boosted DataFusion’s performance, recently making it the fastest engine for querying Apache Parquet files. InfluxData added several time series-specific optimizations through DataFusion’s extension mechanisms, making it more suitable for real-time, high-performance queries on time series data.

With the adoption of DataFusion, InfluxDB 3, and any other organization that adopts it, will have a lightning-fast query engine for time series data. Adding real-time querying capabilities introduces an additional layer of complexity, as the data must be queryable immediately upon ingestion—this is critical for time series data. It’s important to note that querying data immediately after ingestion and obtaining results against large historical datasets are two distinct challenges. InfluxDB 3 engineers spent considerable time optimizing the ingestion process because real-time analytics doesn’t exist without lightning-fast, real-time ingestion. This is why any skilled engineer could build a high-performance engine for querying historical data using the FDAP stack, but real-time querying requires a more advanced solution such as InfluxDB 3.

The ingester

The ingester is custom-built to handle the specific needs of time series data, specifically the massive volumes and velocities required for successful ingestion. InfluxDB 3’s ingester includes a custom parser for the line protocol file format, a time-series-focused write-ahead log (WAL) file, and trade-offs supporting fast write ingestion. Speed is essential for time series workloads, so the ingester prioritizes speed over durability and the strict consistency ACID compliance provides.

When data enters InfluxDB 3, it doesn’t immediately go into Parquet. Instead, the data enters a specialized buffer, which is also based on Arrow, before eventually being written to an object store (Parquet). This system allows for faster data writes, bypassing the wait for data to be placed in object storage before responding to the client. Data is readable immediately after write without waiting for write-to-object storage. The data is eventually written to object storage in batches. This design avoids adding object store communications into the write and query path, which helps avoid the inevitable performance bottlenecks that occur if data is written to object storage in real-time.

Trade-offs in data ingestion in distributed products

In its distributed products (Cloud Serverless, Cloud Dedicated, and Clustered), InfluxDB 3 deliberately trades off ingestion speed and durability. By writing data first to local disks and later to object storage, it enables users to ingest and query massive amounts of data quickly, without waiting for object storage to persist each record. While this approach sacrifices immediate durability, it ensures the system will handle high write throughput, which is crucial for time series data.

InfluxDB 3’s distributed architecture also eliminates the need for complex consensus protocols when ingesting data. Instead, it focuses on maximizing write speed, with eventual consistency handled through the compactor or at read time. This design allows fast, efficient data ingestion without sacrificing scalability or performance.

Real-time querying and data organization

In addition to the ingester, real-time querying in InfluxDB 3 relies heavily on data organization in Parquet files and advanced caching techniques. DataFusion works on flexible Parquet files that give engineers many options for data organization, including custom sorting and data divisions. InfluxDB’s CTO, Paul Dix, spent considerable time optimizing how data is divided across these files, balancing write speed against query performance to ensure that real-time querying remains fast, even as the volume of ingested data grows. InfluxDB 3 organizes data into Parquet files to optimize for both query speed and efficient storage, ensuring it delivers high-performance analytics while efficiently handling large-scale data ingestion.

Where to go from here

The combination of the FDAP stack and custom engineering enables a great time series experience. Though both the ingester and the FDAP stack are powerful independently, together, they deliver capabilities that neither could do alone. InfluxDB 3 represents a breakthrough in time series database technology, combining the power of the FDAP stack with extensive custom engineering to deliver high performance for real-time analytics. By leveraging open source tools like DataFusion, Arrow, and Parquet and enhancing them with bespoke optimizations, InfluxData has created a platform that excels in both high-speed data ingestion and real-time query execution. Its innovative ingester and custom file organization strike a critical balance between speed, scalability, and efficiency, so InfluxDB 3 can handle massive time series workloads without compromising performance.

For more reading on InfluxDB 3’s architecture, check out this post written by our engineering team. Get started for free in the cloud with InfluxDB Cloud Serverless, try the Alpha of our new open source product InfluxDB 3 Core, or contact sales for a custom POC.

Maximizing IIoT Impact with Open Data, AI, and Advanced Analytics: A Comprehensive Guide

Jessica Wachtel (InfluxData) — Wed, 18 Dec 2024 07:00:00 +0000

This tech paper was created by IIoT World and InfluxDB. This post was originally published on IIoT World.

The Industrial Internet of Things (IIoT) is revolutionizing industries like manufacturing, energy, and logistics by creating more intelligent, interconnected systems that elevate productivity and efficiency. With IIoT, machines, systems, and sensors are linked in real-time, streamlining industrial automation and making predictive maintenance a reality—all while reducing downtime and costs.

Transforming industry with the power of IIoT

In today’s competitive landscape, an open data ecosystem is crucial for maximizing IIoT’s potential. Open data allows companies to modify their systems to meet unique needs without being locked into proprietary solutions. This flexibility lets companies control expenses and integrate new technologies, making infrastructure management more streamlined as they scale or expand operations.

Benefits of open data ecosystems in IIoT

Unlike rigid, closed systems, open data solutions empower manufacturers to innovate and customize, adapting to specific operational needs. This flexibility improves resource allocation and data flow, giving companies a significant edge by eliminating vendor lock-in. With open data, industrial firms can make faster adjustments and implement changes more economically, ultimately leading to smoother management and a more agile approach to new technologies.

Key tools: AI and time series databases for real-time insights

As IIoT systems grow, so do the demands for real-time data analysis. Artificial intelligence (AI) and time series databases are indispensable for managing vast amounts of time-stamped data generated by IIoT devices. This booklet showcases how AI tools like TensorFlow, PyTorch, and Apache Spark MLlib optimize industrial processes and deliver real-time insights. Time series databases, such as InfluxDB, play a pivotal role, enabling predictive maintenance, anomaly detection, and enhanced quality control to keep operations running smoothly.

Check out the guide for smarter industrial automation

“The Value of Open Data AI, ML, and Analytics Tools for IIoT” dives into how manufacturers can leverage these tools for a competitive edge. Readers will learn about: AI Tools for Intelligent Data Insights: Discover how advanced AI tools can forecast maintenance needs and spot anomalies before they cause issues, boosting efficiency across the board. Flexible and Scalable Open Data Ecosystems: Learn how open data solutions offer customization options that avoid vendor lock-in, empowering companies to choose the tools that best suit their needs. Real-Time Analytics with Time Series Databases: Explore how time series databases like InfluxDB manage large-scale, time-stamped data, allowing immediate insights and faster, data-driven decisions. Advanced Analytics for Smarter Decision-Making: See how IIoT data, paired with AI, helps decision-makers spot trends, enhance efficiency, and optimize production.

Download the guide today to explore how these cutting-edge tools and open data ecosystems are paving the way for a more efficient, flexible industrial future. Embrace the power of IIoT to keep your operations optimized and ahead of the competition.

Enelyzer’s Journey to Real-Time Sustainability Insights Using InfluxDB

Jessica Wachtel (InfluxData) — Thu, 12 Dec 2024 07:00:00 +0000

Enprove is a leading energy consultancy that helps energy-intensive industries transition to greener solutions. Through audits, expert advice, and their innovative SaaS platform, Enelyzer, Enprove drives sustainable change. More than just a platform, Enelyzer embodies visionary thinking, deep energy expertise, and a passionate commitment to a better future.

The challenge

Enelyzer is a SaaS platform that helps customers reduce their environmental footprint by adopting sustainable practices. It provides real-time or near real-time insights, reports, and visualizations, empowering businesses to refine their energy management strategies. Enelyzer’s interactive approach—asking targeted questions and delivering data-driven insights—includes essential insights on carbon footprint tracking and CGI reporting.

To deliver real-time insights, Enelyzer’s infrastructure needed robust querying and analytical capabilities. As Enelyzer scaled alongside new customers and workloads, performance bottlenecks emerged, first in its initial monolithic system built on Postgres and then in its newly modernized distributed architecture, where challenges persisted. With its focus on high cardinality time series data, Enelyzer required specialized infrastructure. Relational databases couldn’t meet these demands despite careful design and quality technology. After evaluating several options, Enelyzer’s engineering team added InfluxDB to the architecture, ensuring optimized performance for its unique time series requirements.

The solution

The Enelyzer engineering team modernized Enelyzer’s technology stack by adding a leading time series database, InfluxDB Cloud Dedicated. Cloud Dedicated is a single-tenant, managed, cloud-based time series database built for enterprise-grade workloads. The team also rewrote Enelyzer’s backend in Rust and added Telegraf for data collection. After incorporating InfluxDB and Telegraf, Enelyzer engineers found the performance boost they sought and could again deliver real-time querying and analytics to their customers.

Despite many changes, Enelyzer engineers attributed much of the performance gains to InfluxDB Cloud Dedicated. InfluxDB Cloud Dedicated offers custom partitioning, which optimizes queries and data storage. Enelyzer’s instance of InfluxDB Cloud Dedicated organizes data by measurement (similar to a database table) and point (individual data entry including timestamp, fields, and tags) using a custom petitioning strategy. This strategy enhances read and write performance by reducing concurrency and minimizing resource transactions.

Architecture

InfluxDB is compatible with all technical products and services within the open data ecosystem. This highly compatible nature allows the Enelyzer engineering team to deliver features faster and troubleshoot bugs quicker.

Enelyzer offers two customer-facing options for data visualization, charting, and analytics. Enelyzer’s UI provides intuitive charts, visualizations, and custom reports. For customers needing more advanced analysis and alerting, Enelyzer integrates directly with Grafana.

Data enters Enelyzer’s ecosystem through the open-source plugin agent Telegraf. Using Telegraf, Enelyzer continuously ingests data from diverse sources—such as MQTT, API, FTP, PLCs, and third-party or proprietary hardware—in real-time, avoiding vendor lock-in. Telegraf’s plugins for data collection, output, and transformations enable Enelyzer to maintain a unified data pipeline regardless of input and output sources.

In its latest modernization, Enelyzer’s engineering team added Mage AI to the stack. Mage AI manages task scheduling and ETL pipelines with InfluxDB, transforming incoming data as part of Enelyzer’s data normalization process. They also introduced ontologies with reasoning capabilities to support machine learning models.

By integrating InfluxDB Cloud Dedicated, Enelyzer engineers unlocked substantial performance gains directly supporting its mission to help businesses reduce their environmental footprint. Customers benefit from customized real-time reports and queries, AI-driven forecasting, and seamless real-time data collection from any data source. To learn more about Enelyzer, read the full case study here.

Case Study: Modernizing SPEN's Tech Stack with Capula and InfluxDB

Jessica Wachtel (InfluxData) — Tue, 05 Nov 2024 08:00:00 +0000

Background

Scottish Power Energy Networks (SPEN) started a journey to improve its technology by working with Capula and using InfluxDB. Let’s explore the use case, goals, and the role of InfluxDB in this modernization effort.

In 2020, SPEN looked at its data historian systems and realized it needed a better solution to meet future data storage needs. With a focus on modernizing their systems and addressing business challenges through data, they sought a new technology solution.

Capula, a seasoned systems integrator familiar with SPEN’s infrastructure, recommended InfluxDB, a time series database that can manage high volumes of diverse data types. Capula provided comprehensive consultancy services, employing a layered approach to understanding organizational needs, design solutions, and apply technologies that enhance the convergence of operational technology (OT) and information technology (IT). Capula’s efforts included conducting tests and demonstrations to ensure system reliability and cybersecurity, such as Telegraf buffering, visualizing alarm data, and ingesting real-time data using OPC-UA protocols.

Capula undertook a two-phase proof-of-concept project focusing on learning the intricacies of InfluxDB’s architecture and its integration within SPEN’s systems. They performed functional benchmarking, addressed issues related to high data cardinality, optimized data ingestion processes, and tested the distributed architecture of InfluxDB to ensure system stability. Additionally, they assisted in designing data retention policies and preparing for future updates and cybersecurity enhancements.

This ultimately led to a successful initial proof of concept in 2021, which expanded into a more comprehensive integration with SPEN’s existing systems that aligned with their modernization goals and enhanced data quality, reliability, and scalability.

Goals

The specific goals SPEN aimed to achieve by modernizing its data historian systems included:

Handling dramatic increases in data storage requirements.
Consolidating analog and digital data into one system for easier comparison and analysis.
Ensuring the system’s reliability and stability.
Preparing for future data volume and requirements.
Testing new technologies to build knowledge on technical requirements and inform future purchases.
Enhancing decision-making processes by integrating diverse data types and improving data quality and integrity.

Role of InfluxDB

InfluxDB played a crucial role in modernizing SPEN’s tech stack by providing a scalable and efficient solution for handling high volumes of diverse data types. It enabled real-time data ingestion and processing, essential for managing energy sensor data. InfluxDB’s ability to blend different data types, such as field, asset, and operational data, facilitated easier access and integration, enhancing decision-making processes. Additionally, its advanced features for data cleaning, system interoperability, and the capability to design custom data retention policies further demonstrate its effectiveness.

Benefits

The new system ‌significantly improved SPEN’s data quality, reliability, and scalability. It allows for the ingestion of diverse data types without a specific data model (aka schema-on-write), which enhances data quality by enabling the blending of various data sources. The system’s distributed architecture ensures reliability by maintaining database access even if a node becomes unavailable. The system achieves scalability through its ability to handle high volumes of data and its optimization for write-intensive workloads, making it suitable for SPEN’s growing data requirements.

Wrapping up

Overall, the journey involved extensive testing, functional benchmarking, and the development of a scalable and efficient data management solution to support SPEN’s goal of achieving net zero carbon by 2050. InfluxDB stands to play a pivotal role in SPEN’s ongoing transformation, providing the necessary tools and capabilities to meet its current and future data management needs.

Go here to learn more about using InfluxDB for energy and utilities.

Unlock Value with InfluxDB 3 and Expert Support Teams

Jessica Wachtel (InfluxData) — Tue, 06 Aug 2024 08:00:00 +0000

InfluxDB is all about your data: we bridge the gap between an empty database bucket and business value and provide experts to help you derive value from your data. InfluxDB expert support teams come with contracted InfluxDB 3 serverless products (Serverless, Cloud Dedicated) and our Clustered on-prem product. Though no customer is left to figure everything out on their own, your product selection will determine the level of custom support you receive. This post provides insight into working with InfluxDB support for both serverless and on-prem instances of InfluxDB 3.

InfluxDB 3 products

All InfluxDB 3 products are built on the same columnar database engine, purpose-built for time series workloads. Compared to earlier OSS versions, you can expect better write throughput, a reduction in storage costs due to support for cheaper object storage, faster queries for high cardinality data, and faster queries for recently written data across all 3 products. Where they differ is the level of data isolation, ability to customize your build, and extent of support involvement.

InfluxDB Cloud Serverless

Cloud Serverless is a serverless multi-tenant instance of InfluxDB. Cloud Serverless is a managed instance of InfluxDB on shared resources—it’s best suited for smaller time series workloads, including customers with a large workload looking to utilize InfluxDB for some of their data or companies with less time series data. As your workload grows, you can easily move to an isolated instance of InfluxDB. You can get started with InfluxDB Cloud Serverless in just a few minutes.

InfluxDB Cloud Dedicated

Cloud Dedicated is a serverless single-tenant instance of InfluxDB, managed on isolated resources. It’s designed for enterprise-grade workloads at scale and those who require data isolation and prefer a serverless, managed product. Cloud Dedicated offers a higher level of customization than Cloud Serverless: you can build custom partitions and optimize query performance. Our support team will work with you to custom-tune your Cloud Dedicated instance based on your workload’s specifications. InfluxDB support merges its operations behavior with a customer-first focus, leading to a very high-touch experience for Cloud Dedicated customers.

InfluxDB Clustered

Clustered is an on-prem or private cloud instance of InfluxDB. Like Cloud Dedicated, Clustered is designed for enterprise-grade time series workloads operating at scale. Because Clustered runs in your environment, it’s a self-managed service that provides the highest degree of customization. This includes query optimizations, custom partitioning, and additional tuning, which are only available in the self-managed infrastructure. Clustered operates in the Kubernetes environment. This is key when evaluating whether Clustered is the right product for your organization. If you don’t have engineers with Kubernetes expertise, Clustered may require additional developer resources on your end and potential changes to your development environments.

Support tiers

Consultative Support

Though Clustered is a self-managed product, there is an overlap with Cloud Dedicated and Serverless in terms of consultative support. Customers aren’t limited by their engineering teams’ level of InfluxDB expertise. Our support team works alongside our customers’ engineering teams to fill the experiential gaps between them and what InfluxDB needs.

One of the most critical services our support team offers is providing context and a deep understanding of how best practices apply to your preferences and desired outcomes. How you decide to store your data will have a significant impact down the line. Our support team works with your engineering teams to ensure that best practices form the underlying data management foundation. This will save human and financial resources and offer optimal results from your data.

Schema design and data durability are two examples where support helps customers with best practices. InfluxDB uses schema on write, adding flexibility to how you decide to store your data. However, it’s important to consider the optimal number of tags and contextual information collected in a data set to optimize data storage and overall performance. Data durability refers to how long you keep data. Optimizing storage looks very different for customers with ephemeral data stored in InfluxDB for 48 hours versus data stored in perpetuity.

Managed Service vs. Self-Managed Service

The main difference between a managed service and a self-hosted service is the level of visibility our support teams have into your database instance. InfluxData hosts its serverless products on its own infrastructure, meaning our support teams have complete visibility into your instance of InfluxDB. This eliminates the need for customers to manage any database resources, including migrating to InfluxDB from another product, database security, backup and recovery, high availability configurations, performance monitoring, and more.

Outsourced Resources

InfluxDB managed services offer customers a “set it and forget it” experience. The service removes the technical overhead burden from the customer, including additional headcount and engineering resources, because you’re outsourcing your database administration team.

In the event of an issue, both managed and self-managed products have access to our support team, but the support workflow differs between the two. Since our team has direct access to your cluster in a managed service, support can immediately begin troubleshooting and acting on solutions with minimal to no customer intervention. With our self-managed service, our support team works alongside your engineers because they won’t have direct access to the system.

Scaling

Our Clustered and Cloud Dedicated products scale alongside temporary and permanent customer growth. When a Cloud Dedicated cluster receives an increased data load, a support team member reviews the cluster and proactively or reactively adjusts memory or instantly increases an instance’s size. Though it might look automatic to the customer, it is not an automatic process. We can manually scale clusters up and down in response to appropriate qualifying events. This scaling support is included in our managed service.

It isn’t hard to scale up on-prem, but it requires a little more legwork from our customers. Rather than InfluxDB support scaling your cluster, your engineers must complete the action. They’ll need to monitor data volume and scale accordingly. When you’re ready to scale, let your Account Executive know, and they can provide additional licenses to purchase.

Maintenance

Maintenance updates become available to Cloud Serverless, Cloud Dedicated, and Clustered customers at the same time. The speed at which they are incorporated into your instance of InfluxDB varies between managed and self-managed services. Our managed service performs all maintenance updates and operations on your behalf without interruption. On-prem customers will have to conduct their own maintenance operations. Depending on the organization, this can delay the speed at which the software is updated, as processing and reviewing processes can delay the update installations.

Disaster Recovery

Customers accidentally delete their data.ot often, but it happens. The degree to which the data is recoverable doesn’t vary between self-managed and managed database instances. However, the speed of recovery may. When using a managed product, InfluxDB engineers immediately begin the restoration troubleshooting process. Support cannot work directly in your cluster ifyou’re using a product within your own environment, but they’ll work hand-in-hand with your engineers throughout the restoration process. You’ll recover your data in both cases but data recovery is quicker in a managed environment.

Get started with InfluxDB!

InfluxDB offers a range of products tailored to meet various organizational needs, from managed serverless solutions like Cloud Serverless and Cloud Dedicated to self-managed on-prem solutions like Clustered. Our support teams work closely with your engineers to ensure seamless operations, whether you choose a managed or self-managed service. By leveraging our expertise and customizable solutions, you can optimize your data storage, performance, and recovery processes, ultimately driving greater business value and operational efficiency.

Ready to get started? Contact sales for a custom POC or sign up for a free cloud account now.

Optimizing Space Technology: Fast Data Access with InfluxDB and Apache Parquet

Jessica Wachtel (InfluxData) — Wed, 10 Jul 2024 08:00:00 +0000

To win the space race, aerospace and aviation companies must be fast. The end-to-end cycle of testing, visualizing test data, and making improvements demands swiftness, especially when a single launch yields billions of data points. It starts with real-time access to data. Real-time data analysis with nanosecond precision is crucial for monitoring environmental and habitat conditions when lives are at stake.

Speeding up the iteration pipeline is essential but not sufficient. Cost efficiency matters too. Air and space innovation is expensive, and strong data analysis practices can save money. Using data to inform decisions optimizes processes and ensures space technology is built right the first time, reducing both costs and waste.

Access to production system data is another challenge. It’s one thing to pull telemetry data into a database; it’s another to share access to that data. When equipment is in flight, incoming data is vital. Any deviation from expected values must reach engineers immediately for swift action. However, once the data ages, more teams and data scientists need to query and analyze it to make further improvements and continue the iteration process.

InfluxDB 3 and Apache Parquet facilitate quick data access across the organization, eliminating vendor lock-in limitations. These tools ensure reliable data access throughout the product pipeline, helping to speed up iterations, reduce costs, and provide fast, accurate data access. By choosing software from the open data ecosystem, you can protect your team and data while accelerating innovation.

Why InfluxDB and Apache Parquet?

Purpose-built time series database InfluxDB is the gateway to faster data accessibility and availability. InfluxDB handles the high velocity and volume of time series data in real-time and persists that data as Apache Parquet files. Parquet has become the standard in the open data ecosystem. This means after InfluxDB ingests the data, anyone with access to the database can easily download Parquet files from the production system and load the data into any of the many tools participating in the open data ecosystem or another instance of InfluxDB.

This eliminates the need for custom data formatting, giant CSV downloads and uploads, and promotes limited access to the production system. By participating in the open data ecosystem, Parquet allows users to extend the value and efficacy of time series data to other areas and applications not previously possible.

Parquet is an open source, columnar data file format designed for fast processing of complex data. Parquet supports different encoding and compression schemes on a per-column bases that allow for efficient data storage and retrieval in bulk. A number of open source projects have adopted the standard. Delta Lake, Apache Iceberg, Snowflake, Hive, Spark, Redshift, Google BigQuery, and Pandas are a few of the tools that adopted the Parquet standard. They’re available to all organizations working within the open data ecosystem. Many of these projects are built around object storage with Parquet files and an elastic query tier to process the files.

Getting data into InfluxDB from nearly any system or device is also smooth. The open source server based agent, Telegraf collects data from countless databases, systems, and sensors. Telegraf has over 300 plugins making InfluxDB a seamless addition to any tech stack.

In addition to moving Parquet files to other systems, working with data inside InfluxDB also provides great benefit to all stages of technical integration. Because InfluxDB itself is part of the open data ecosystem, it connects you to automation, machine learning, and artificial intelligence tools that ramp up time-to-market. This includes dashboarding software Grafana, Tableau, and Power BI. Gain a competitive edge with improved insights by integrating with leading ML/AI tools such as Tensorflow and Petastorm.

Try InfluxDB today

Ready to get started with Parquet files? Sign up for a free cloud account today. If you’re unsure of the size of your workload and want to learn more about what you can do with InfluxDB, contact our sales team here.

Boost Your Monitoring Stack: Add InfluxDB to Prometheus Node

Jessica Wachtel (InfluxData) — Thu, 20 Jun 2024 08:00:00 +0000

Prometheus is the go-to observability tool for countless developers and organizations, and for good reason. The popular open source tool doesn’t require any up-front costs or result in vendor lock-in. Prometheus’ short on-ramp makes the technology well-suited for organizations looking to jump-start their cloud monitoring journey.

However, there is a downside to using Prometheus as the single source of truth for observability metrics. Prometheus was built for use as a single node. Many of the issues it faces are similar to challenges associated with all single-node tools. Though these are not reasons to avoid the software altogether, they render Prometheus unsuitable as the sole observability platform for mission-critical applications. But there is a solution. Adding InfluxDB to an existing stack will not change Prometheus’ workflow and will add enterprise-grade features such as data durability and high availability. The remainder of this article will focus on how Prometheus functions and how it can work with InfluxDB.

Prometheus: the basics

Prometheus is a metrics-based monitoring system that provides insight into connected systems and services. It collects and stores metrics as time series data, which is any data stored with a timestamp. This provides users with a better understanding of the metrics at a certain point in time. Prometheus provides a toolchain of libraries and server components that allow users to expose, collect, and record metrics.

Prometheus is natively compatible with nearly all tech stacks. It collects metrics by pulling them through an exposed HTTP endpoint that serves a text-based metrics transfer format understood by Prometheus. The text-based metrics format promotes seamless metrics collection, even when developers cannot use a dedicated client library for tracking and exposing metrics. Once the data lands in Prometheus, developers have deep insight into their system. Prometheus excels with real-time monitoring and short-term storage.

When it comes to alerting and dashboarding, open source dashboarding and alerting software Grafana is Prometheus’ go-to companion. Grafana helps both technical and non-technical people discover data insights. Grafana has easy to understand, ready-made dashboards that “just work” out of the box. Grafana also provides templates for developers looking to build custom dashboards. The Grafana/Prometheus connection is so popular, Grafana’s Prometheus community is one of its most mature.

Prometheus: the challenges

Prometheus is a single-node system. While there’s nothing inherently wrong with that, many of the features required for mission-critical, enterprise-level systems are lacking. Mission-critical applications require high availability and data durability, which a single node cannot provide. Prometheus doesn’t have any built-in data replication mechanisms that are mandatory for data durability. This means if any of the existing mechanisms or hard drives fail, users will encounter data loss.

Because Prometheus can’t guarantee high availability, it offers no assurances against unplanned downtime and does occasionally go down. This also causes unrecoverable data loss. Lastly, due to its size, Prometheus doesn’t scale well or offer any long-term storage. But there is a simple way to safeguard an application against all these issues. Add an instance of purpose-built time series database InfluxDB alongside Prometheus and turn the single node into a highly available, mission-critical, enterprise-grade observability system.

InfluxDB: the solution

InfluxDB is a durable, scalable, enterprise-grade time series database that works in tandem with Prometheus. Developers can add InfluxDB to an existing Prometheus stack without changing how Prometheus operates, and experience all the benefits of using Prometheus for mission-critical applications. Out-of-the-box InfluxDB offers high availability and data durability. InfluxDB scales alongside any workload and has no forced retention policies. InfluxDB persists data as highly compressed Apache Parquet files and stores data indefinitely in low-cost object storage.

InfluxDB is well suited to work alongside Prometheus because it’s one of the few databases that can support Prometheus data. Prometheus data is known for having high cardinality. InfluxDB 3.0 supports unbounded cardinality, meaning organizations can collect data in any resolution they prefer. Prometheus based insight by sending data to InfluxDB. Because Parquet is a universal standard, InfluxDB is the gateway between Prometheus and access to bleeding-edge data analysis tools available within the open data ecosystem.

InfluxDB offers self-hosted, on-prem, and both single and multi-tenant cloud products, all of which are compatible with Prometheus. Below are two options of how to add InfluxDB to an existing Prometheus stack without changing the data collection workflow.

Enterprise-grade stack

Writing data to both Prometheus and InfluxDB at the same time from the same source will alleviate any issues caused by single-node architecture. The InfluxDB backup guarantees high availability, redundancy, and scalability. Dual writing offers the safety of continued data collection if Prometheus experiences an outage during data collection. Without the backup, any outages will cause permanent data loss.

This architecture allows developers to query dashboards from InfluxDB or continue using Prometheus as their primary source for metrics. InfluxDB is compatible with Grafana, so organizations interested in querying both databases could do so in a single pane of glass. Developers can turn on the Grafana connection in a pinch if they’re using InfluxDB as a backup and need to review data when Prometheus is suddenly unavailable.

Backup Prometheus metrics

InfluxDB’s Telegraf plugin offers simple ingestion of Prometheus data into InfluxDB.

Sending data directly from Prometheus to InfluxDB offers more data durability and long-term storage than using Prometheus alone. It does not, however, prevent lost data if Prometheus becomes suddenly unavailable. Developers can also send Prometheus metrics to InfluxDB in the enterprise grade stack build.

Prometheus and InfluxDB: better together

Adding InfluxDB to a Prometheus stack will round out Prometheus’s rough edges without changing the way organizations view and analyze data. Plenty of resources exist to help get the most out of InfluxDB and Prometheus. Learn more about InfluxDB here. Learn more about the InfluxDB’s Prometheus Telegraf plugin here.

Ready to get started? Sign up for a free cloud account or contact sales.

Efficiency Unleashed: Streamlining Workflows with the InfluxDB Management API

Jessica Wachtel (InfluxData) — Thu, 23 May 2024 08:00:00 +0000

InfluxDB recently launched the InfluxDB Management API for InfluxDB Cloud Dedicated. Now, developers can manage databases, database tokens, and create database tables with custom partitioning directly from their application. The Management API provides a programmatic interface for performing tasks that previously required human interaction.

This interface promotes easier workflows for applications that need automatic provisioning of multiple instances of InfluxDB, either for internal or external purposes. Developers can also include database management in their own code and applications through integrated functions, boosting automation, speed, and efficiency.

Previously, the only way to provision database services was through the influxctl CLI. This option is still available and unchanged. The Management API simply offers a second option for developers who need automation.

How it works

Similar to provisioning database services through influxctl CLI, the Management API requires a management token. Unlike database tokens, management tokens regulate databases, tables, and access tokens in your cluster. Their purpose does not extend to reading or writing data.

InfluxDB v3’s management tokens are short-lived by default. The OAuth2 identity provider grants a specific user administrative access to your InfluxDB cluster. For automation purposes, developers can now create long-lasting management tokens that bypass the OAuth identity provider and authenticate directly with their InfluxDB cluster. The long-lasting tokens work with the Management API and automation workflow because they don’t require human interaction. Use the influxctl CLI to create long-lasting management tokens.

You can access the Management API endpoints for your cluster at:

https://console.influxdata.com/api/v0/accounts/{accountId}/clusters/{clusterId}

Manage databases

Create, list, update, and delete databases in your Cloud Dedicated cluster at the following endpoint:

/api/v0/accounts/{accountId}/clusters/{clusterId}/databases

Example: create a database

  curl \
	--location "https://console.influxdata.com/api/v0/accounts/ACCOUNT_ID/clusters/CLUSTER_ID/databases" \
	--request POST
	--header "Accept: application/json" \
	--header 'Content-Type: application/json' \
	--header "Authorization: Bearer MANAGEMENT_TOKEN" \
	--data '{
  	"name": "'DATABASE_NAME'",
  	"maxTables": 500,
  	"maxColumnsPerTable": 250,
  	"retentionPeriod": 2592000000000,
  	"partitionTemplate": [
    	{
      	"type": "tag",
      	"value": "TAG_KEY_1"
    	},
    	{
      	"type": "tag",
      	"value": "TAG_KEY_2"
    	},
    	{
      	"type": "bucket",
      	"value": {
        	"tagName": "TAG_KEY_3",
        	"numberOfBuckets": 100
      	}
    	},
    	{
      	"type": "bucket",
      	"value": {
        	"tagName": "TAG_KEY_4",
        	"numberOfBuckets": 300
      	}
    	},
    	{
      	"type": "time",
      	"value": "%Y-%m-%d"
    	}
    	]
  	}'

InfluxDB Cloud Dedicated customers can customize partitions. A partition is a logical grouping of data stored in Apache Parquet format in InfluxDB v3. By default, InfluxDB Cloud Dedicated partitions data by day. Developers can customize their partition strategy to improve query performance if the default mode isn’t optimal for their schema and workload.

Database tokens

Database tokens grant read and write permissions to one or more databases in your cluster. After a developer generates the database token, they are dispersible between applications, devices, and customers. To authenticate an application, the database token and database name is included in each write or query request to a cluster.

Create, list, update, and delete database tokens in your Cloud Dedicated cluster at the following endpoint:

/api/v0/accounts/{accountId}/clusters/{clusterId}/tokens

Each device requires a unique token to write and query data stored in one or more cluster databases. Database token rotation is a security best practice and common workflow. This includes creating and assigning new tokens and deleting old tokens on a schedule.

Example: create and delete tokens

sh
curl \
   --location "https://console.influxdata.com/api/v0/accounts/ACCOUNT_ID/clusters/CLUSTER_ID/tokens" \
   --header "Accept: application/json" \
   --header 'Content-Type: application/json' \
   --header "Authorization: Bearer MANAGEMENT_TOKEN" \
   --data '{
 	"description": "Read/write token for DATABASE_NAME",
 	"permissions": [
   	{
     	"action": "write",
     	"resource": "DATABASE_NAME"
   	},
   	{
     	"action": "read",
     	"resource": "DATABASE_NAME"
   	}
 	]
   }'

To learn more about the InfluxDB Management API, check out our documentation. To learn more about Cloud Dedicated’s performance, please see our benchmarking paper or contact our sales team for further assistance.