A history of the virtual synchrony replication model

Authors:
Ken Birman
Affiliations:
Cornell University
Venue:
Replication
Year:
2010

Citing 57
Cited 1

An algorithm for concurrency control and recovery in replicated distributed databases

ACM Transactions on Database Systems (TODS)
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.

ACM Transactions on Programming Languages and Systems (TOPLAS)
Low cost management of replicated data in fault-tolerant distributed systems

ACM Transactions on Computer Systems (TOCS)
Reliable communication in the presence of failures

ACM Transactions on Computer Systems (TOCS)
Exploiting virtual synchrony in distributed systems

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
The X-Kernel: An Architecture for Implementing Network Protocols

IEEE Transactions on Software Engineering
Using process groups to implement failure detection in asynchronous environments

PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Lightweight causal and atomic group multicast

ACM Transactions on Computer Systems (TOCS)
ACTA: the SAGA continues

Database transaction models for advanced applications
A response to Cheriton and Skeen's criticism of causal and totally ordered communication

ACM SIGOPS Operating Systems Review
Understanding the limitations of causally and totally ordered communication

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Experience with modularity in consul

Software—Practice & Experience
The temporal logic of actions

ACM Transactions on Programming Languages and Systems (TOPLAS)
Distributed ML: abstracts for efficient and fault-tolerant programming

Distributed ML: abstracts for efficient and fault-tolerant programming
Increasing the resilience of atomic commit, at no additional cost

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Distributed process groups in the V Kernel

ACM Transactions on Computer Systems (TOCS)
The Transis approach to high availability cluster communication

Communications of the ACM
Horus: a flexible group communication system

Communications of the ACM
On the impossibility of group membership

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Enriched View Synchrony: A Programming Paradigm for Partitionable Asynchronous Distributed Systems

IEEE Transactions on Computers
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Concurrency and distribution in object-oriented programming

ACM Computing Surveys (CSUR)
Coyote: a system for constructing fine-grain configurable communication services

ACM Transactions on Computer Systems (TOCS)
Building reliable, high-performance communication systems from components

Proceedings of the seventeenth ACM symposium on Operating systems principles
Replication and fault-tolerance in the ISIS system

Proceedings of the tenth ACM symposium on Operating systems principles
A review of experiences with reliable multicast

Software—Practice & Experience
Concurrency Control in Distributed Database Systems

ACM Computing Surveys (CSUR)
Guardians and Actions: Linguistic Support for Robust, Distributed Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
Reliable broadcast protocols

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
An inheritance-based technique for building simulation proofs incrementally

ACM Transactions on Software Engineering and Methodology (TOSEM)
On objects and events

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
A Network Protocol Stack in Standard ML

Higher-Order and Symbolic Computation
Nonblocking commit protocols

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
Determining the last process to fail

PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
Primary Partition "Virtually-Synchronous Communication" harder than Consensus

WDAG '94 Proceedings of the 8th International Workshop on Distributed Algorithms
Consensus service: a modular approach for building agreement protocols in distributed systems

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Reliable Communication in the Presence of Failures

Reliable Communication in the Presence of Failures
The Totem System

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
The ensemble system

The ensemble system
Linguistic Support for Distributed Programming Abstractions

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Towards Safe Distributed Application Development

Proceedings of the 26th International Conference on Software Engineering
Secure Spread: An Integrated Architecture for Secure Group Communication

IEEE Transactions on Dependable and Secure Computing
Type-based publish/subscribe: Concepts and experiences

ACM Transactions on Programming Languages and Systems (TOPLAS)
Paxos made live: an engineering perspective

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Sinfonia: a new paradigm for building scalable distributed systems

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The Chubby lock service for loosely-coupled distributed systems

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Live Distributed Objects: Enabling the Active Web

IEEE Internet Computing
Implementing Fault-Tolerant Distributed Objects

IEEE Transactions on Software Engineering
Programming with Live Distributed Objects

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Quicksilver Scalable Multicast (QSM)

NCA '08 Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications

Distributed OSGi built over message-oriented middleware

Software—Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this chapter, we discuss a widely used fault-tolerant data replication model called virtual synchrony. The model responds to two kinds of needs. First, there is the practical question of how best to embed replication into distributed systems. Virtual synchrony defines dynamic process groups that have self-managed membership. Applications can join or leave groups at will: a process group is almost like a replicated variable that lives in the network. The second need relates to performance. Although state machine replication is relatively easy to understand, protocols that implement state machine replication in the standard manner are too slow to be useful in demanding settings, and are hard to deploy in very large data centers of the sort seen in today's cloud-computing environments. Virtual synchrony implementations, in contrast, are able to deliver updates at the same data rates (and with the same low latencies) as IP multicast: the fast (but unreliable) Internet multicast protocol, often supported directly by hardware. The trick that makes it possible to achieve these very high levels of performance is to hide overheads by piggybacking extra information on regular messages that carry updates. The virtual synchrony replication model has been very widely adopted, and was used in everything from air traffic control and stock market systems to data center management platforms marketed by companies like IBM and Microsoft. Moreover, in recent years, state machine protocols such as those used in support of Paxos have begun to include elements of the virtual synchrony model, such as self-managed and very dynamic membership. Our exploration of the model takes the form of a history. We start by exploring the background, and then follow evolution of the model over time.