The 5 Ws (and 1H) of InfluxDB Edge Data Replication
By Jason Myers / Jul 13, 2022 / InfluxDB, Developer, Community
As more businesses generate and process data at the edge, the need to share data from edge nodes to a centralized cloud location increases. Replicating data from the edge to the cloud ensures consistency across an entire application and creates an uninterrupted historical record that preserves the critical context of time.
Edge Data Replication (EDR) is a feature available in InfluxDB designed to address this challenge. It facilitates efficient workflows between edge and cloud instances of InfluxDB to give users more control over their data and how they use it.
Who: Users and businesses that rely on highly distributed systems and applications. Those that lean heavily into edge computing, using edge nodes to process and manage data, will likely have a strong interest in EDR. These users typically have a dual need for data at the edge and in the cloud. Using edge data at the edge gives local operators real-time insight into on-site processes, while sending that same data to the cloud provides visibility into global operations in a central location.
What: InfluxDB is an ideal platform for use at the edge. Combined with the new EDR capability, InfluxDB provides reliable, durable, and automatic replication of data from the edge to the cloud for edge computing workloads. EDR uses an on-disk queue to handle outbound data, so if errors or connection outages occur, InfluxDB continues to write data to the queue at the edge. Once connectivity resumes, EDR automatically flushes the queue to the cloud.
When: EDR is available now and executes replication in real time.
Where: EDR creates a data pipeline between the edge and the cloud. If you’re already using InfluxDB at the edge, you can easily configure the EDR feature. Otherwise, download InfluxDB OSS and try it out. For InfluxDB Cloud, users don’t need to do anything except watch the data roll in (at which point, feel free to start working with it).
Why: The reality of many distributed systems is that users on the edge have very different data needs than users working in the cloud. Edge users often need high-precision, highly granular data to power real-time dashboards to manage on-site operations. Meanwhile, cloud users need coarser data from the edge to build a global view of all their edge activity. EDR makes it easy to move data (or a data subset) from the edge to the cloud so that the data you need is always where you need it.
How: With just a few commands you can quickly configure EDR in InfluxDB to send data from your durable queue on the edge to an instance of InfluxDB Cloud. Because EDR works on a bucket-to-bucket level, running InfluxDB at the edge means that you can do all kinds of data transformations and aggregations before your data hits the replication bucket. This means you can use Flux, tasks, or any combination of the two, to clean data on the edge before sending it to the cloud. This reduces the amount of data you need to send, reducing costs, cloud storage needs, and making the processes you run in the cloud more efficient.