Recorded: April 2017
In this video, we will show you how to rebalance your InfluxEnterprise clusters to keep your solution performant and highly available.
Watch the webinar “Rebalancing your InfluxEnterprise clusters” by clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “Rebalancing your InfluxEnterprise clusters.” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Regan Kuchan: Technical Writer, InfluxData
Regan Kuchan 00:03.682 Good morning. I am Regan, and I am a Technical Writer here at InfluxData. And today I will be focusing on working with InfluxData’s InfluxEnterprise product. InfluxEnterprise is the clustering version of the open source Time Series Database that you know as InfluxDB. Like InfluxDB, InfluxEnterprise is a Time Series Database that’s built from the ground up to handle high writing query loads. InfluxEnterprise, however, includes proprietary functionality that specializes in working with projects that require high availability and horizontal scalability. I won’t be covering everything about InfluxEnterprise in this presentation. For today, I’m going to focusing on how to rebalance an InfluxEnterprise cluster. Currently, this is a manual process. There is a tool in the works that will automate this process. But until then, these are the steps that you will need to take to manually rebalance a cluster.
Regan Kuchan 01:09.281 Let’s start with an overview of what I’ll be talking about today. First, I’ll go over the cluster jargon and functionality that you need to know about before you dive into a rebalance. This is an overview. So again, I won’t be covering everything about InfluxEnterprise. But you will know about the relevant cluster components for performing a rebalance. Next, I’ll explain what a rebalance is and why you would need to do one. And after that, I’ll walk you through the two primary scenarios for performing a rebalance. And I’ll give step-by-step instructions for each scenario. By the end of this presentation, I’m hoping that you’ll be comfortable with the relevant clustering terminology as well as some of the necessary tooling for doing a rebalance of your own. Here is the basic terminology that you need to know to get through this presentation.
Regan Kuchan 02:09.335 One, InfluxEnterprise is a cluster. And a cluster is made up of several nodes. Each of those key part cuboctahedrons on this slide represents one node in a cluster. So in this case, I have five nodes. A node can be a meta node or a data node. Meta nodes and data nodes do different things. For the purposes of this presentation, I’ll be focusing on data nodes and only on data nodes. Meta nodes are very important, but you don’t really directly work with them when you’re doing a rebalance. The primary role of data nodes is that they store your time series data. They store all the measurement information, the tag information, and the field information, and the actual data points.
Regan Kuchan 03:02.541 Data nodes store data in things called shards. Shards are kind of like files that cover a specific time range. The system looks at the time stamp on an incoming data clump and writes that point to the relevant shard. So in this scenario, a data point with a time stamp 12:30 will be written to shard 1 because shard 1 covers the time range between 12:00 and 12:59. A data point with a timestamp 3:45 on the other hand, will be written to shard 4 because shard 4 covers the time range between 3:00 and 3:59. Full disclosure. This is a bit of a simplified description of what goes on in a cluster. But it’ll do for the purposes of this presentation. The last thing that you need to know about before we dive into the rebalance is the replication factor.
Regan Kuchan 04:00.029 Replication factor is part of a retention policy, and it determines how many data nodes have a copy of a shard. So here we have shard 1. Shard 1 belongs to a retention policy that has a replication factor of two. So shard 1 has two owners, that’s data node one, and data node two. Now you know what everything is. But what is an unbalanced cluster? An unbalanced cluster is a cluster with an unequal distribution of shards across the data nodes. So here on this slide, we have an unbalanced cluster. Data nodes one and two have four shards each, and data node three only has one shard. The shards aren’t equally distributed across the cluster, so it’s unbalanced. In practice, assuming you have a replication factor of three, a balanced cluster would have shards 1, 2, and 3 on data node three as well.
Regan Kuchan 05:08.199 So how do clusters become unbalanced? How do you get into a state where shards are unevenly distributed across your data nodes? There are several ways, but I’m only going to be focusing on one explanation for today. That explanation is a cluster becomes unbalanced when you add a data node to a cluster. So if you remember the unbalanced cluster from the previous slide, the two original data nodes have shards 1, 2, 3, and 4. The new data node, that’s data node three, has shard 4 only. So that means that sometime between the time that the system created shard 3, and before the system created shard 4, we added data node three. So why would you get your cluster into this unbalanced state? Why would you add a data node in the first place? There are two performance reasons for adding a data node. The first reason is to increase cluster disk size, or cluster write throughput. The second reason is to increase data availability and/or query throughput. The proper rebalance path depends on your reason for adding a data node.
Regan Kuchan 06:23.052 In the next part of this webinar, I’m going to walk through the two possible rebalance paths after you add a data node to a cluster. Before I start, I make a couple of assumptions when performing these rebalances. First, I assume that I’m adding a third data node to a cluster that previously only had two data nodes. Next, I assume that that cluster currently has a replication factor of two. The rebalance procedures that I’ll go through are applicable to different cluster sizes with different replication factors. But just know that some of these specific user-provided values in the next slides will depend on your setup. Just know that the details of a rebalance depend on your cluster’s replication factor, its number of data nodes, and for how long you’ve been writing data to the system. My last assumption before we begin is that you’re only working with real-time data. Real-time data are data that are arriving in the present, with the current timestamp. If you’re writing historical data to your cluster, so data with timestamps that occur in the year 1899, or even yesterday, the rebalance steps will be very different from what I’m about to show you.
Regan Kuchan 07:48.454 Let’s start out with rebalance scenario one. The purpose of this scenario is to increase availability for queries and query throughput. To give you a visual overview of what we’ll be doing in the next steps, we have a two data node cluster that currently has a replication factor of two. We’re going to add a third data node to that cluster. And we’ll update the replication factor to three. Finally, we’ll safely copy the existing shards from the old data nodes to the new data node. To start off, I add a third data node to my cluster, using the influxd-ctl tool. The influxd-ctl tool is available on all meta nodes in a cluster. This is the only time that I’ll mention meta nodes in this presentation.
Regan Kuchan 08:45.012 Here, I run the influxd-ctl tool on a meta node that’s part of the cluster. And I add the new data node with the hostname cluster 03 that’s running on port 8088. In the before code block on the left, the influxd-ctl tool show command reveals that I have just two data nodes in my cluster, that’s cluster 1 on port 8088, and cluster 2 on port 8088. In the second code block—this is after I ran the add data command—I have three data nodes, that’s cluster 1, cluster 2, and cluster 3, all running on port 8088. Next, I increase the replication factor of my retention policy from two to three. This step ensures that when the system creates any new shards, they’ll automatically be distributed across all three data nodes. Distributing the shards across all three data nodes increases those shards’ data availability for queries and query throughput. In this example, I only have one database and one retention policy in my cluster. So I only need to run the alter retention policy command once.
Regan Kuchan 10:02.814 In reality, you’ll probably have more than one database and retention policy in your cluster. And you’ll need to do this step for every database-retention policy combination in your cluster. The command on this slide changes the replication factor of the autogen retention policy in the Telegraf database from two to three. To double-check that my alter retention policy query was successful, I run the show retention policies query and see that my replication factor for the autogen retention policy in my Telegraf database is now three. Great. Now comes the fun part. I truncate all hot shards. Before I get into it, there are two questions that I need to answer. First, what is a hot shard? And second, why do I need to truncate them?
Regan Kuchan 10:58.804 Hot shards are shards that are currently receiving writes. They’re the shards with a time range that ends in the future. Cold shards are shards that are no longer receiving writes. They’re the shards with a time range that is entirely in the past. Performing any action on a hot shard can lead to data inconsistency within a cluster, and that is bad, and it requires manual intervention. To prevent data inconsistency, we truncate hot shards. Truncating a shard creates a new hot shard that is automatically distributed across all three data nodes in the cluster because we’ve already updated the replication factor. All previous writes are now stored in cold shards. When I run influxd-ctl truncate shards, the expected output is simply truncated shards. Once I’ve done that, I can work on distributing the cold shards across the cluster without the threat of inconsistency. Any hot shards or newly created shards will be automatically distributed across the cluster, and they don’t require any further attention.
Regan Kuchan 12:20.976 Now it’s time to identify the cold shards that we want to manually distribute across the cluster. To do that, we just run the influxd-ctl show-shards command. This slide shows a simplified version of the show shards output. The first thing to notice is the asterisk on the first timestamp in the end column. That asterisk means that the system truncated that shard. The next thing to notice is the owner’s column. The first shard, shard seven, appears on two data nodes in the cluster. And the second shard, shard nine appears on three data nodes in the cluster. Finally, the last thing to notice is the actual timestamp in the end column. For the first shard, the end timestamp occurs in the past. And for the second shard, the timestamp occurs in the future. So actually, it’s in the past. But for the sake of this example, just assume that it’s just before February 2nd, 2017 at midnight.
Regan Kuchan 13:28.067 All of these indicators reveal that the first shard is cold for writes, and the second shard is hot for writes. Because all hot shards are automatically distributed across the three data nodes in the cluster, you can ignore that one. The only information that you need to take away from all this is the idea of the cold shard and the TCP address of one of its current owners. In this case, we just take shard ID 7 and the TCP address, cluster 02 8088. In reality, depending on the age of your cluster and the length of your shards, you’ll have more than one cold shard in your cluster. The next step is to copy the cold shards to the new data node. This step increases those shards’ data availability for queries and query throughput. Once again, we use the influxd-ctl command.
Regan Kuchan 14:28.192 The syntax is relatively straightforward. It’s influxd-ctl copy shard followed by the source TCP address—that’s the address of the data node that currently has the shard—then followed by the destination TCP address—that’s the address of the new data node. And finally, the ID of the cold shard. The code block shows the command in action. Here we take shard 7 that’s currently on cluster 02, running on port 8088. We copy it to the data node at cluster 03, also running on port 8088. A successful copy shard command returns copy shard seven from the source data node to the destination data node. You’ll need to repeat this command for every cold shard that you’d like to move to the new data node.
Regan Kuchan 15:23.898 We’re almost done with rebalancing the [area of one?]. The last step is to confirm that the rebalance took place. There are a couple ways to do this. The first way is to rebalance the output of the influxd-ctl show shards command. Shards that were successfully copied to the new data node should list that data node as an owner in the owner’s column. The second way, which we recommend doing in addition to the first option, is to compare the shard directories on the source data node and on the destination data node. Shards are located in the var/lib/influxdb/data/relevant_database_name/relevant_retention_policy_name/shard_id/directory.
Regan Kuchan 16:12.361 In the results in the slide, you can see that the tsm file for shard seven and the Telegraf database and autogen retention policy is identical and appears on both the source data node and the destination data node. Do this for every shard that you copied over, and then you’re good to go. You successfully added a data node to your cluster and rebalanced it in order to increase availability for queries and query throughput. Now I’m going to walk through rebalance scenario two. The purpose of this scenario is to expand the disk size of the cluster or increase right throughput. To give you a visual overview of what we’ll be doing, in the next steps we have a two data node cluster that currently has a replication factor of two. We add a third data node to that cluster. Notice that we do not update the retention policies replication factor to three. Then we safely copy some of the existing shards from the old data nodes to the new data node. And then we remove this successfully copied shard from the old data nodes.
Regan Kuchan 17:21.606 I’ll go through each of these steps and focus primarily on the new parts of the process that are specific to the second rebalance procedure. Step one is the same as the first rebalance– is the same as the first rebalance scenario. We use the influxd-ctl tool on one of the clusters existing meta nodes to add a third data node to the previously two data node cluster. Now, unlike rebalance procedure one, we do not update the replication factor of every retention policy from two to three. We do not perform any step because, this time, our goal is to expand the disk size of the cluster and increase write throughput. We won’t be able to get there if we simply start writing all new data to all three data nodes. This time we intentionally leave all replication factors at two. That number is less than the number of data nodes in the cluster.
Regan Kuchan 18:27.128 The second step for the second rebalance scenario is to truncate hot shards with the influxd-ctl tool. You’ll remember from the previous scenario that we truncate hot shards to prevent data inconsistency in the cluster. In this case, truncating hot shards creates new hot shards that are automatically distributed to two data nodes in the cluster. This could be on the two original data nodes or on one of the original data nodes and a new data node. This means that we don’t need to worry about or touch those hot shards or any newly created shards because they’re automatically distributed across all data nodes in the cluster. The old shards, the ones that were initially created when there were only two data nodes in the cluster and only appear on those two data nodes, are now stored in cold shards. So it’s safe to move those shards around the cluster without the threat of data inconsistency.
Regan Kuchan 19:29.367 Next, we identify the cold shards that we want to move around the cluster with the influxd-ctl show shards command. To give you a little more experience working with the show shards output, I’ve changed the sample content event for this rebalance scenario. Before I start talking about them, let’s assume that today is actually January 26th, 2017 at say 6:45 PM. The first shard on this slide, shard 21 ended today at 6:00 PM. So it ends in the past. This indicates that shard 21 is a cold shard. You’ll also notice that its owners are on the two original data nodes with the hostnames cluster 01 and cluster 02. Like shard 21, shard 22 also ends in the past and belongs to the two original data nodes. The asterisk on shard 22’s end timestamp indicates that it was truncated. So shard 22 is definitely a cold shard.
Regan Kuchan 20:36.141 The last shard, shard 24, ends in the future. Remember, it’s January 26th at 6:45 PM. Shard 24 belongs to one of the original data nodes, that’s cluster 02 and the new data node, that’s cluster 03. Shard 24 is a hot shard. And it’s already on the new node. So we can ignore shard 24 for the rest of the rebalance process. The information that we need to take away from all this is the idea of the cold and the TCP address of one of the data nodes that owns that shard. Now, we use the information that we gathered to copy the cold shards to the new data node, and we use influxd-ctl’s copy shard argument to do it. In the examples on this side, we copy shard 21 from the cluster 01 data node to the cluster 03 data node, and we copy shard 22 from the cluster 02 data node to the cluster 03 data node.
Regan Kuchan 21:41.269 This next step is extremely important. Here we confirm that shards 21 and 22 were, in fact, copied to the new data node. We confirm this in two ways. The first way is to rerun the show shards command. Here, you see that shard 21 and shard 22 now appear on all three data nodes, one of which is the new data node at cluster 03. The second confirmation step is to look at the actual shard files on the source data node and on the destination data node. You can see that the individual tsm files for shards 21 and 22 do appear on both data nodes. If either of these confirmation methods do not show the expected output, we recommend repeating the copy shard step. We’re being extremely cautious here because the next step is destructive.
Regan Kuchan 22:41.320 The destructive step is to remove the now unnecessary shards from the cluster. This step is specific to rebalance procedure two and will actually expand the disk size of your cluster and increase write throughput. We use the influxd-ctl tool to remove shards from a cluster. The syntax is influxd-ctl, remove shard, followed by the TCP address of the data node that you’d like to remove the shard from, and then followed by the relevant shard ID. Removing a shard is an irrecoverable, destructive action. And we recommend exercising caution when using this command. In the example on the side, we remove shard 21 from the data node at cluster 01 and remove shard 22 from the data node at cluster 02. The expected output of the command is just remove shard followed by the shard ID from and the TCP address of the relevant data node.
Regan Kuchan 23:47.234 The final step is to confirm that the rebalance did everything that you wanted it to. Once again, run the influxd-ctl show shards command. Focusing on the owner’s column, notice that shards 21 through 24 now, correctly have two owners each, and they’re evenly distributed across all three data nodes in the cluster. Shard 21 is on the data nodes at cluster 02 and cluster 03, and shard 22 is on the data nodes at cluster 01 and cluster 03. This is exactly what we wanted to see.
Regan Kuchan 24:22.263 So to summarize, we successfully added a data node to a cluster. We’ve safely copied old shards to that new data node. And we’ve removed the unnecessary shards from the original data nodes. All this rebalanced our cluster to expand its disk size and increase write throughput. As a reward for getting through all of this, here is an adorable French Bulldog who now feels extremely balanced. That’s it for today. To go over it, we’ve covered a little bit about how clusters work underneath it all and two possible cluster rebalance scenarios. The first scenario adds a data node and rebalances the cluster to increase availability for queries and query throughput. The second adds a data node and rebalances the cluster to expand disk size and increase write throughput. The rebalance steps that I went through apply to different cluster sizes and to clusters with different replication factors. But I want to highlight that your rebalance experience will depend a lot on your own cluster setup. Your cluster’s replication factor, its number of data nodes and for how long you’ve been writing data to the system matter a lot.
Regan Kuchan 25:40.738 If you’re interested in learning more or reading about rebalancing a cluster, I highly recommend checking out the rebalance guide in the InfluxEnterprise docs. It goes into a bit more detail than what I covered in this webinar. In addition, if you’ve become intrigued by the influxd-ctl tool that we used throughout this presentation, check out the docs on that tool as well. It could do some cool things for managing your cluster. I hope this helped you get more comfortable with clusters and how to go about rebalancing them. Please feel free to ask any questions, and thank you for listening.