Announcing InfluxDB Edge Data Replication: Combining the Power of the Cloud with the Precision of the Edge
By Sam Dillard / Jun 15, 2022 / InfluxData, Community, InfluxDB, Developer
There are technical and business reasons to have a time series data presence both at the edge and in the cloud – InfluxDB has always played a key role in both contexts. Today, we’re announcing Edge Data Replication, a new feature that combines these two deployment strategies. With this announcement, InfluxData begins a greater initiative to accommodate both edge and cloud data workloads in one unified solution.
The edge is becoming more critical than ever. Businesses requiring new solutions at the edge, combined with a rapid increase in data from these devices, creates enormous challenges for developers to manage and use that data effectively. Data has gravity and that gravity gets stronger as applications become more distributed and data volumes at the edge grow exponentially.
Thousands of businesses use InfluxDB Open Source (OSS) at the edge for local collection, storage, and processing of time series data. This gives InfluxDB the role of managing the edge data layer, allowing API access, locally served dashboards, integration with automation and controls, and delivery to alerting systems. In addition, thousands more organizations and developers have adopted InfluxDB Cloud to build cloud-native time series applications.
The needs for edge and cloud data, respectively, are strong – and even strict at times. There are many considerations when choosing the edge or cloud for storage and compute, but ultimately, developers want to ensure these two data layers work together in a simple and efficient way that delivers centralized business insights in near real-time. Up until now, architecting an edge to cloud integration required significant upfront and ongoing investment that is expensive, time-consuming and error-prone.
Introducing InfluxDB Edge Data Replication
Today we’re excited to announce the immediate availability of Edge Data Replication, a solution that unifies time series data processing between edge and cloud environments so developers can provide consolidated intelligence across widely distributed environments. Edge Data Replication (EDR) makes it possible to safely replicate data from InfluxDB OSS buckets to InfluxDB Cloud buckets in real time, meaning users can now collect, store and analyze high-precision time series data from the edge, and view that data in the cloud.
This new feature builds upon two key properties of InfluxDB OSS. First, InfluxDB OSS can run very efficiently on resource-constrained systems. Second, InfluxDB OSS runs Flux, a functional data scripting language and task engine that can analyze and transform time series data in any way developers choose. At InfluxData, we’re focused on simplifying the developer experience around times series data, so developing a way to connect these OSS properties to enable a full-featured edge-to-cloud data pipeline was a no-brainer.
Version 2.2 of InfluxDB introduces a new feature to automate the replication of time series data to other InfluxDB instances. Durable, disk-backed queues add buffering to withstand planned and unplanned disruptions in network connectivity. In addition, this feature configures replication at the bucket level so developers and operators can precisely define which sets of data to replicate and where to send them.
This replication happens on-write. When time series data arrives at the edge and matches a replication rule, OSS immediately mirrors the data to the remote bucket defined in the configuration. When InfluxDB OSS cannot send the data to the remote instance, all events buffer in the local durable queue until connectivity is restored. The size and retention of the durable queue are configurable.
Tangibly, the feature consists of two new API endpoints (/remotes and /replications) and two new CLI commands (remote and replication). Each replicated bucket also gets a disk-backed queue for buffering data safely in case of disruptions.
Giving distributed applications an edge
Edge Data Replication gives developers a fast and durable way to stream edge data to the cloud, with minimal configuration. This capability gives users the data they need in the cloud to build a global view of all edge activity. Benefits include:
- Integrate edge and cloud workloads: Enables businesses to quickly engage in both edge and cloud computing for their time series workloads, without requiring engineering to introduce or manage any third-party software.
- Reduce network costs: Edge Data Replication helps users intelligently reduce the volume of data they send to the cloud by leveraging built-in capabilities of the Flux tasks engine. Operators of OSS nodes at the periphery can pre-compute aggregations and filtering so they can limit the data sent over the internet to the most critical data. They may also use this feature to filter or compute interesting events and forward only those events to the cloud for monitoring.
- Transform edge data: Dynamically add context to edge data that is necessary for processing in the cloud, which may require extra dimensions to satisfy queries or feature engineering.
- Enforce edge-cloud consistency: Withstand planned and unplanned disruptions in network connectivity by queueing data at the edge until it can be safely transferred to the cloud.
Reimagining the TSDB architecture
Data architects need a robust and repeatable way to design systems that work within the constraints of both the edge and the cloud; systems that understand each of their respective strengths and weaknesses. With Edge Data Replication, we are unlocking a way to accommodate both edge and cloud data workloads with flexibility in the collection, analysis, and storage of time series data – no matter its point of origin or intended destination.
Meeting developers where they are
Today’s announcement is an early step in InfluxData’s broader vision to support tomorrow’s technologies, applications, and developer challenges with the InfluxDB platform. It is always our goal to meet developers where they are. With this launch, we’re helping them deliver faster Time to Awesome so they can uncover critical edge insights and move on to other aspects of building their applications.
Get started with Edge Data Replication today
To start building your edge-cloud pipeline, sign up for InfluxDB Cloud if you haven’t already. Otherwise, check out the documentation. If you’re interested in learning more and enabling edge-cloud duality of time series data, sign up for our upcoming EDR webinar on June 28.