Understanding InfluxDB Enterprise: What is a Cluster?

By Katy Farmer / Use Cases, Developer, Product
May 08, 2018

Navigate to:

If you’re getting started with InfluxDB Enterprise, you might have some questions. That’s always good news to us; questions mean you want to know more.

Today’s question is: what is a cluster?

In general, you probably know what a cluster is. I can have a cluster of nodes or a cluster of grapes. But what does it entail in the world of InfluxDB Enterprise?

<figcaption> Pictured: Clustering now available in Puppy Enterprise</figcaption>

An InfluxDB Enterprise cluster is comprised of two types of nodes: meta nodes and data nodes.

You can read the official documentation, and we’ll talk about the fundamentals here.

Data Nodes

All of your raw time series data lives on data nodes. Your instance(s) of Influx and/or Kapacitor are cozy here. They don’t care about consensus, and they don’t participate in the decision-making. Data nodes are here to hold your data.

For high availability, you need a replication factor of at least 2. Replication factor simply means how many copies of any one shard should exist. The number of data nodes should be divisible by the replication factor. You DO NOT want a prime number of data nodes for very computer science reasons. Trust us.

Hardware

CPU: 8+ cores RAM: 64+ GB Disk: SSD drives (>1000 IOPS recommended)

Meta Nodes

Meta nodes have a simple job: keep state consistent. Simple, but really important when it comes to your data. You never want to worry about which shard lives on which node or reaching consensusthis is the type of complex and vital data that meta nodes manage. Meta nodes know only basic information about state, such as retention policies, users, and databases.

The most important thing to remember is that you need an odd number of meta nodes. Think of trying to decide where to get lunch with your friends. If there’s a tie between burritos and burgers, you’ll probably starve before you decide. But if there’s an odd number of you, a tie is out of the question. An odd number of meta nodes ensures they can always reach consensus (if you’re curious about consensus, we use the RAFT consensus algorithm) to make decisions.

If you need high availability from your cluster, you need at least 3 meta nodes. If you only have 1 meta node, you’ve met the criteria for the odd number necessary for consensus, but you don’t have any meta nodes if that one becomes unavailable. Two is crazy talk. Don’t you remember the most important rule? ODD NUMBERS ONLY. That brings us to three meta nodes: room for high availability and for the odd number.

Hardware

CPU: 1-2 cores RAM: 512MB - 1GB Disk: 1 HDD (any size) Can be run in a VM or container

A Note on Load Balancers

Technically, you can have a cluster without a load balancer, but unless you’re really into manually routing requests to individual nodes, you’re probably going to use a load balancer. Influx is load balancer-agnostic, so choose the one that you like and trust. We’ll handle the rest.

Summary

InfluxDB Enterprise clusters are made of meta nodes and data nodes (and usually a load balancer). The minimum setup for high availability is three meta nodes and two data nodes.

Navigate to:

Try InfluxDB Cloud

Stop flying blind

Understanding InfluxDB Enterprise: What is a Cluster?

By Katy Farmer / Use Cases, Developer, Product
May 08, 2018

Navigate to:

Data Nodes

Hardware

Meta Nodes

Hardware

A Note on Load Balancers

Summary

Ready to get started?

InfluxDB 3 Core & Enterprise GA: The Next Generation Time Series Platform for Developers is Here

Data Lakes and Warehouses

InfluxDB for Industrial IoT:
A Live Demonstration

Time Series Databases Explained

Network Monitoring

Time Series Data Analysis: Definitions and Best Techniques in 2025

Product & Solutions

Developers

Company

Navigate to:

Try InfluxDB Cloud

Stop flying blind

Get Updates

Understanding InfluxDB Enterprise: What is a Cluster?

By Katy Farmer / Use Cases, Developer, Product May 08, 2018

Navigate to:

Data Nodes

Hardware

Meta Nodes

Hardware

A Note on Load Balancers

Summary

Ready to get started?

InfluxDB 3 Core & Enterprise GA: The Next Generation Time Series Platform for Developers is Here

Data Lakes and Warehouses

InfluxDB for Industrial IoT: A Live Demonstration

Time Series Databases Explained

Network Monitoring

Time Series Data Analysis: Definitions and Best Techniques in 2025

Product & Solutions

Developers

Company

Sign up for the InfluxData newsletter

Follow Us

By Katy Farmer / Use Cases, Developer, Product
May 08, 2018

InfluxDB for Industrial IoT:
A Live Demonstration