Distributed Systems for Fun and Profit

The author wanted a text that would bring together the ideas behind many of the more recent distributed systems – systems such as Amazon’s Dynamo, Google’s BigTable and MapReduce, Apache’s Hadoop and so on.

In this text he has tried to provide a more accessible introduction to distributed systems. To him, that means two things: introducing the key concepts that you will need in order to have a good time reading more serious texts, and providing a narrative that covers things in enough detail that you get a gist of what’s going on without getting stuck on details. Beyond 2013, you’ve got the Internet, and you can selectively read more about the topics you find most interesting.

In his view, much of distributed programming is about dealing with the implications of two consequences of distribution:

that information travels at the speed of light
that independent things fail independently

In other words, that the core of distributed programming is dealing with distance (duh!) and having more than one thing (duh!). These constraints define a space of possible system designs, and his hope is that after reading this you’ll have a better sense of how distance, time and consistency models interact.

This text is focused on distributed programming and systems concepts you’ll need to understand commercial systems in the data center. It would be madness to attempt to cover everything. You’ll learn many key protocols and algorithms (covering, for example, many of the most cited papers in the discipline), including some new exciting ways to look at eventual consistency that haven’t still made it into college textbooks – such as CRDTs and the CALM theorem.

The first chapter covers distributed systems at a high level by introducing a number of important terms and concepts.
The second chapter dives deeper into abstractions and impossibility results.
The third chapter discusses time and order, and clocks as well as the various uses of time, order and clocks (such as vector clocks and failure detectors).
The fourth chapter introduces the replication problem, and the two basic ways in which it can be performed.
The fifth chapter discussed replication with weak consistency guarantees.
The appendix covers recommendations for further reading.