To reason about distributed systems, we need to define precisely what can and can’t happen. A system model encodes assumptions about the behavior of nodes, communication links, and timing; think of it as a set of assumptions that allow us to reason about distributed systems by ignoring the complexity of the actual technologies used to implement them.
Let’s start by introducing some models for communication links:
Even though these models are just abstractions of real communication links, they are useful to verify the correctness of algorithms. As we have seen in the previous chapters, it’s possible to build a reliable and authenticated communication link on top of a fair-loss one. For example, TCP does precisely that (and more), while TLS implements authentication (and more).
We can also model the different types of node failures we expect to happen:
While it’s possible to take an unreliable communication link and convert it into a more reliable one using a protocol (e.g., keep retransmitting lost messages), the equivalent isn’t possible for nodes. Because of that, algorithms for different node models look very different from each other.
Byzantine node models are typically used to model safety-critical systems like airplane engine systems, nuclear power plants, financial systems, and other systems where a single entity doesn’t fully control all the nodes1. These use cases are outside of the book’s scope, and the algorithms presented will generally assume a crash-recovery model.
Finally, we can also model the timing assumptions:
In the rest of the book, we will generally assume a system model with fair-loss links, nodes with crash-recovery behavior, and partial synchrony. For the interested reader, “Introduction to Reliable and Secure Distributed Programming” is an excellent theoretical book that explores distributed algorithms for a variety of other system models not considered in this text.
But remember, models are just an abstraction of reality, and sometimes abstractions leak. As you read along, question the models’ assumptions and try to imagine how algorithms that rely on them could break.
For example, digital cryptocurrencies such as Bitcoin implement algorithms that assume Byzantine nodes.↩︎