Introduction

Now that we understand how to coordinate processes, we are ready to dive into one of the main use cases for building distributed systems: scalability.

A scalable application can increase its capacity as its load increases. The simplest way to do that is by scaling up and running the application on more expensive hardware, but that only brings you so far since the application will eventually reach a performance ceiling.

The alternative to scaling up is scaling out by distributing the load over multiple nodes. This part explores 3 categories — or dimensions — of scalability patterns: functional decomposition, partitioning, and duplication. The beauty of these dimensions is that they are independent of each other and can be combined within the same application.

Functional decomposition

Functional decomposition is the process of taking an application and breaking it down into individual parts. Think of the last time you wrote some code; you most likely decomposed it into functions, classes, and modules. The same idea can be taken further by decomposing an application into separate services, each with its own well-defined responsibility.

Section 12.1 discusses the advantages and pitfalls of splitting an application into a set of independently deployable services.

Section 12.2 describes how external clients can communicate with an application after it has been decomposed into services using an API gateway. The gateway acts as the proxy for the application by routing, composing, and translating requests.

Section 12.3 discusses how to decouple an API’s read path from its write path so that their respective implementations can use different technologies that fit their specific use cases.

Section 12.4 dives into asynchronous messaging channels that decouple producers on one end of a channel from consumers on the other end. Thanks to channels, communication between two parties is possible even if the destination is temporarily not available. Messaging provides several other benefits, which we will explore in this section, along with best practices and pitfalls you can run into.

Partitioning

When a dataset no longer fits on a single node, it needs to be partitioned across multiple nodes. Partitioning is a general technique that can be used in a variety of circumstances, like sharding TCP connections across backends in a load balancer.

We will explore different sharding strategies in section 13.1, such as range and hash partitioning. Then, in section 13.2, we will discuss how to rebalance partitions either statically or dynamically.

Duplication

The easiest way to add more capacity to a service is to create more instances of it and have some way of routing, or balancing, requests to them. This can be a fast and cheap way to scale out a stateless service, as long as you have considered the impact on the service’s dependencies. Scaling out a stateful service is significantly more challenging as some form of coordination is required.

Section 14.1 introduces the concept of load balancing requests across nodes and its implementation using commodity machines. We will start with DNS load balancing and then dive into the implementation of load balancers that operate at the transport and application layer of the network stack. Finally, we will discuss geo load balancing that allows clients to communicate with the geographically closest datacenter.

Section 14.2 describes how to replicate data across nodes and keep it in sync. Although we have already discussed one way of doing that with Raft in chapter 10, in this section, we will take a broader look at the topic and explore different approaches with varying trade-offs (single-leader, multi-leader, and leaderless).

Section 14.3 discusses the benefits and pitfalls of caching. We will start by discussing in-process caches first, which are easy to implement but have several pitfalls. Finally, we will look at the pros and cons of external caches.