A history of the virtual synchrony replication model

  • Authors:
  • Ken Birman

  • Affiliations:
  • Cornell University

  • Venue:
  • Replication
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this chapter, we discuss a widely used fault-tolerant data replication model called virtual synchrony. The model responds to two kinds of needs. First, there is the practical question of how best to embed replication into distributed systems. Virtual synchrony defines dynamic process groups that have self-managed membership. Applications can join or leave groups at will: a process group is almost like a replicated variable that lives in the network. The second need relates to performance. Although state machine replication is relatively easy to understand, protocols that implement state machine replication in the standard manner are too slow to be useful in demanding settings, and are hard to deploy in very large data centers of the sort seen in today's cloud-computing environments. Virtual synchrony implementations, in contrast, are able to deliver updates at the same data rates (and with the same low latencies) as IP multicast: the fast (but unreliable) Internet multicast protocol, often supported directly by hardware. The trick that makes it possible to achieve these very high levels of performance is to hide overheads by piggybacking extra information on regular messages that carry updates. The virtual synchrony replication model has been very widely adopted, and was used in everything from air traffic control and stock market systems to data center management platforms marketed by companies like IBM and Microsoft. Moreover, in recent years, state machine protocols such as those used in support of Paxos have begun to include elements of the virtual synchrony model, such as self-managed and very dynamic membership. Our exploration of the model takes the form of a history. We start by exploring the background, and then follow evolution of the model over time.