What is database high-availability?
Ensuring that data is always available to employees, partners, customers and systems is a top priority for any organization – in some scenarios, minutes of downtime can result in significant loss of revenue and reputation. Database high-availability (HA) aims to reduce the downtime that might cause these things. However, there is not a “one size fits all” approach to delivering HA. Unique application attributes, business requirements, operational capabilities, budgetary constraints and legacy infrastructure can all influence your technology selection. And then the technology itself is only one element in delivering a complete HA solution, the administrators and the process they use the guarantee uptime are just as important as the technology itself.
What options do you have in creating highly-available InfluxDB systems?
Although this list is not exhaustive, most highly-available InfluxDB systems we’ve seen deployed in production and that we support, can be described as such:
- High-Availability with Nginx
- Fully Customized High-Availability
- InfluxDB Relay
- Clustering with Geographic Replication
The strategy that will work best for your use case depends on a variety of factors. For example:
Where will you be hosting your system?
Depending on where you are hosting InfluxDB, there might be some built-in high-availability software or services you can take advantage of with little or no integration work. For example the InfluxData Cloud offers automatic backups and similarly, backups are straightforward to implement on clouds like AWS.
What type of hardware will you be running on?
Obviously, under-performing hardware or flakey networks will most likely lead to system instabilities in high-load situations. Best to solve this problem first as it will make the system that much more highly-available. We’ve seen scenarios where customers were having availability issues because of under-powered hardware or poor schema and query designs…which once remedied, made their systems a factor more available immediately.
How much coding are you willing to do yourself and how much will I rely on off-the-shelf software maintained by someone else?
You’ll need to consider how much code you want to maintain yourself to get just the right amount of customization, integration and automation for your system. Often times, it is easier to relax a requirement or change your design to be able to incorporate ready built software. This trade off often save times, money and definitely the costs associated with maintaining custom code.
Other questions to consider:
- How much automation do you need vs manual intervention?
- How much management and monitoring overhead are you willing to incur to maintain the system?
- How much money are you willing to spend for availability?
- How much downtime is acceptable? (Or how much will downtime cost you?)
High-Availability Solutions Overview
InfluxDB’s backup and restore functionality provides the most simple way to achieve a basic level of availability. In fact, even clustered deployments have some sort of backup procedure in place. This should be the first procedure to throughly understand before architecting a more complex high-availability architecture.
Learn More: InfluxDB Backup and Restore Documentation
High-Availability with Nginx
We’ve talked to InfluxDB users that have achieved high availability by double or triple writing their data through Nginx. This method is quick to set up and gives users basic HA as long as they use backups for disaster recovery.
Need Help? InfluxData’s Professional Services can help you architect and implement a high-availability solution based on Nginx. Contact us for details.
Fully-Customized High Availability
Another option is to create a lightweight layer either in your application code or as a microservice in front of InfluxDB nodes. To achieve high availability, the proxy only needs to do double writes to two InfluxDB servers.
Need Help? InfluxData’s Professional Services can help you architect and implement a fully customized high-availability solution that meets your exact needs. Contact us for details.
Sharding is an enhancement you can add on top of a high-availability architecture. Sharding allows you to scale out to more than a fully replicated setup.
If your application is written in Go or you write the proxy in Go, you’ll be able to examine the InfluxDB queries using the open source InfluxQL package. Using that package you can pick what you’d like to shard data on. We’ve seen users shard based on either the measurement name or a tag like `customer_id`.
Need Help? InfluxData’s Professional Services can help you architect and implement a sharded solution that meets the exact requirements of your use cases. Contact us for details.
The open source InfluxDB Relay project (GitHub repo is here) adds a basic high-availability layer to InfluxDB. With the right architecture and disaster recovery processes, this project can help achieve a highly-available setup.
InfluxDB Relay’s architecture is pretty straight-forward. It consists of a load balancer, two or more InfluxDB Relay processes and two or more InfluxDB processes. The load balancer should point UDP traffic and HTTP POST requests with the path `/write` to the two relays while pointing GET requests with the path `/query` to the two InfluxDB servers.
Running InfluxDB Relay is simple,and requires a single binary and configuration file. We provide several sample configurations with the project. If you want to add a recovery process and sharding to your InfluxDB Relay setup, check out the documentation for more details.
The basic InfluxDB Relay setup sends reads directly to InfluxDB, while writes get sent through InfluxDB Relay as illustrated below:
Contact Us: Need help setting up, configuring and running InfluxDB Relay in production? Contact us for more information on how we can help.
Influx Enterprise’s clustering features provides the highest degree of fault-tolerance, automation and scalability. This functionality will first become available on the InfluxCloud platform and on-premise shortly thereafter.
Contact Us: Get on the early access list for Influx Enterprise by contacting us for more information on availability and pricing.
Clustering with Geographic Replication
Database clustering with geographic replication support is a rather complex and expensive architecture to implement and support. Influx Enterprise will make designing this type of solution much less costly and complex. Influx Enterprise’s clustering features provides the highest degree of fault-tolerance, automation and scalability. This functionality will first become available on the InfluxCloud platform and on-premise shortly thereafter.
Contact Us: Get on the early access list for Influx Enterprise with support for geographic replication by contacting us for more information on availability and pricing.