Scaling Throughput and Performance in a Sharding Database System

Navigate to:

This article was originally published in InfoWorld and is reposted here with permission.

Understand the two dimensions of scaling for database query and ingest workloads, and how sharding can make scaling elastic — or not.

Scaling throughput and performance are critical design topics for all distributed databases, and sharding is usually a part of the solution. However, a design that increases throughput does not always help with performance and vice versa. Even when a design supports both, scaling them up and down at the same time is not always easy.

This post will describe these two types of scaling for both query and ingest workloads, and discuss sharding techniques that make them elastic. Before we dive into the database world, let us first walk through an example of elastic throughput and performance scaling from daily life.

Scaling effects in a fast food restaurant

Nancy is opening a fast food restaurant and laying out the scenarios to optimize her operational costs on different days of the week. Figure 1 illustrates her business on a quiet day. For the restaurant to be open, there are two lines which must remain open: drive-thru and walk-in. Each requires one employee to cover. On average, each person needs six minutes to process an order, and the two employees should be able to cover the restaurant’s expected throughput of 20 customers per hour.

Figure 1- The restaurant operation on a quiet day

Figure 1: The restaurant operation on a quiet day.

Let’s assume that an order can be processed in parallel by at most two people, one making drinks and the other making food. Nancy’s employees are trained to go and help with the other line if their line is empty. Doubling up on a single line reduces the order processing time to three minutes and helps keep the throughput steady when customers enter the lines at various intervals.

Figure 2 shows a busier day with around 50% more customers. Adding an employee should cover the 50% increase in throughput. Nancy requests her team to be flexible:

  • If only one customer comes to a line at a time, one person should run between two lines to help reduce the processing time so they will be available to help new customers immediately.
  • If a few customers walk in at the same time, employees should open a new line to help at least two walk-in customers at the same time because Nancy knows walk-in customers tend to be happier when their orders are taken immediately but very tolerant with the six minute processing.

Figure 2 - The operation that covers 50 more customers

Figure 2: The operation that covers 50% more customers.

To smoothly handle the busiest days of the year, which draw some 80 customers per hour, Nancy builds a total of four counters: one drive-thru and three walk-ins, as shown in Figure 3. Since adding a third person to help with an order won’t help reduce the order time, she plans to staff up to two employees per counter. A few days a year, when the town holds a big event and closes the street (making the drive-thru inaccessible), Nancy accepts her max throughput will be 60 customers per hour.

Figure 3 - The operation on a busy day

Figure 3: The operation on a busy day.

Nancy’s order handling strategy elastically scales customer throughput (i.e., scales as needed) while also applying flexibility to make order processing time (i.e., performance) faster. Important points to notice:

  1. The max performance scaling factor (max number of employees to help with one order) is two. Nancy cannot change this factor if she wants to stick with the same food offerings.
  2. The max throughput is 80 customers per hour due to the max number of counters being four. Nancy could change this factor if she has room to add more counters to her restaurant.

Scaling effects in a sharding database system

Similar to the operation at a fast food restaurant, a database system should be built to support elastic scaling of throughput and performance for both query and ingest workloads.

Query workload

Term definition:

  • Query throughput scaling: the ability to scale up and down the number of queries executed in a defined amount of time such as a second or a minute.
  • Query performance scaling: the ability to make a query run faster or slower.
  • Elastic scaling: the ability to scale throughput or performance up and down easily based on traffic or other needs.


Let’s assume our sales data is stored in an accessible storage location such as a local disk or a remote disk or a cloud. Three teams in the company, Reporting, Marketing, and Sales, want to query this data frequently. Our first setup, illustrated in Figure 4, is to have one query node to receive all queries from all three teams, read the data, and return the query results.

Figure 4 - One query node handles all requests

Figure 4: One query node handles all requests.

At first this setup works well but when more and more queries are added, the wait time to get results back becomes quite large. Worse, many times the queries get lost due to timeouts. To deal with the increasing query throughput requests, a new setup shown in Figure 5 provides four query nodes. Each of these nodes works independently for our different business purposes: one for the Reporting team, one for the Marketing team, one for the Sales team focusing on small customers, and one for the Sales team focusing on large customers.

Figure 5 - Add more query nodes

Figure 5: Add more query nodes, one for each business purpose, to handle more throughput.

The new setup catches up well with the high volume of throughput and no queries get lost. However, for some time-sensitive queries that the teams need to react to immediately, waiting several minutes to get the result back is not good enough. To solve this problem, the data is split equally into four shards, where each shard contains data of 12 or 13 states, as shown in Figure 6. Because the Reporting team runs the most latency sensitive queries, a query cluster of four nodes is built for them to perform queries four times faster. The Marketing team is still happy with its single-node setup, so data from all shards is directed to that one node.

Figure 6 - Shard data and add Query Nodes to handle sharded data in parallel

Figure 6: Shard data and add Query Nodes to handle sharded data in parallel.

The Sales team does not deal with time-sensitive queries, but as this team grows larger, the number of query requests keep increasing. Therefore, the Sales team should take advantage of performance scaling to improve throughput and avoid reaching max throughput in the near future. This is done by replacing two independent query nodes with two independent query clusters, one with four nodes and the other two nodes, based on their respective growth.

Figure 7 - Adjust the size of the Reporting cluster

Figure 7: Adjust the size of the Reporting cluster based on the Reporting team’s performance needs and shut down a Sales cluster based on the Sales team’s throughput needs.

During times of the year when the Reporting team does not need to handle time-sensitive queries, two query nodes of its cluster are temporarily removed to save resources, as shown in Figure 7. Similarly, when the Sales team does not need to handle high throughput workloads, it temporarily removes one of its clusters and directs all queries to the remaining one.

The teams are happy with their elastic scaling setup. The current setup allows all teams to scale throughput up and down easily, by adding or removing query clusters. However, the Reporting team notices that its query performance does not improve beyond the limit factor of four query nodes; scaling query nodes beyond that limit doesn’t help. Thus we can say that the Reporting team’s query throughput scaling is fully elastic, but its query performance scaling is only elastic to the scale factor of four.

The only way the Reporting team can scale query performance further is to split data into more and smaller shards, which is not trivial. We’ll discuss this next.

Ingest workload

Term definition:

  • Ingest throughput scaling: the ability to scale up and down the amount of ingested data in a defined amount of time such as a second or a minute.
  • Ingest performance scaling: the ability to increase or decrease the speed of ingesting a set of data into the system.


Figure 8 - One ingest node handles all ingested data

Figure 8: One ingest node handles all ingested data.

In order to have four shards of sales data as described above, the ingest data must be sharded at load time. Figure 8 illustrates an ingest node that takes all ingest requests, shards them accordingly, handles pre-ingest work, and then saves the data to the right shard.

However, when the ingest data increases, one ingest node no longer catches up with the requests and ingest data gets lost. Thus a new setup shown in Figure 9 is built to add more ingest nodes, each handling data for a different set of write requests to support higher ingest throughput.

Figure 9 - Add ingest nodes

Figure 9: Add ingest nodes, each handling a subset of write requests, to support more throughput.

Even though the new setup handles a higher ingest volume of throughput and no data gets lost, the increasing demand of lower ingest latency makes the teams think they need to change the setup further. The ingest nodes that need lower ingest latency are converted into ingest clusters, shown in Figure 10.

Here each cluster includes a shard node that is responsible for sharding the coming data and additional ingest nodes. Each ingest node is responsible for processing pre-ingest work for its assigned shards and sending the data to the right shard storage. The performance of Ingest Cluster 2 is twice that of Ingest Node 1, as the latency is now around half of the previous setup. Ingest Cluster 3 is around four times as fast as Ingest Node 1.

Figure 10 - Convert ingest nodes to ingest clusters to speed up data ingest

Figure 10: Convert ingest nodes to ingest clusters to speed up data ingest.

During times of the year when the latency is not critical, a couple of nodes are temporarily removed from Ingest Cluster 3 to save resources. When ingest throughput is minimal, Ingest Cluster 2 and Ingest Cluster 3 are even shut down and all write requests are directed to Ingest Node 1 for ingesting.

As with their query workloads, the Reporting, Marketing, and Sales teams are very happy with the elastic scaling setup for their ingest workloads. However, they notice that even though ingest throughput scales up and down easily by adding and removing ingest clusters, when Ingest Cluster 3 has reached its scale factor of four, adding more ingest nodes to its cluster doesn’t improve performance. Thus we can say that its ingest throughput scaling is fully elastic, but its ingest performance scaling is only elastic to the scale factor of four.

Preparing for future elasticity

As demonstrated in the examples, the query and ingest throughput scaling of the setups in Figure 6 and Figure 10 are fully elastic, but their performance scaling is only elastic to the scale factor of four. To support a higher performance scaling factor, the data should be split into smaller shards, e.g., one shard per state. However, when we go with a smaller scale factor, many shards must be mapped to one query node in the query cluster. Similarly, one ingest node must handle the data of many shards.

A limitation of performance scaling is that increasing the scale factor (i.e., splitting data into smaller shards) does not mean the system will scale as expected due to the overhead or limitations of each use case—as we saw in Nancy’s fast food restaurant, where the max performance scaling factor was two employees per order.

The elastic throughput and performance scalings described in this post are just examples to help us understand their role in a database system. The real designs to support them are a lot more complicated and need to consider more factors.