An algorithm for concurrency control and recovery in replicated distributed databases
ACM Transactions on Database Systems (TODS)
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.
ACM Transactions on Programming Languages and Systems (TOPLAS)
Low cost management of replicated data in fault-tolerant distributed systems
ACM Transactions on Computer Systems (TOCS)
Reliable communication in the presence of failures
ACM Transactions on Computer Systems (TOCS)
Exploiting virtual synchrony in distributed systems
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
The X-Kernel: An Architecture for Implementing Network Protocols
IEEE Transactions on Software Engineering
Using process groups to implement failure detection in asynchronous environments
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Lightweight causal and atomic group multicast
ACM Transactions on Computer Systems (TOCS)
Database transaction models for advanced applications
A response to Cheriton and Skeen's criticism of causal and totally ordered communication
ACM SIGOPS Operating Systems Review
Understanding the limitations of causally and totally ordered communication
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Experience with modularity in consul
Software—Practice & Experience
ACM Transactions on Programming Languages and Systems (TOPLAS)
Distributed ML: abstracts for efficient and fault-tolerant programming
Distributed ML: abstracts for efficient and fault-tolerant programming
Increasing the resilience of atomic commit, at no additional cost
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Distributed process groups in the V Kernel
ACM Transactions on Computer Systems (TOCS)
The Transis approach to high availability cluster communication
Communications of the ACM
Horus: a flexible group communication system
Communications of the ACM
On the impossibility of group membership
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Enriched View Synchrony: A Programming Paradigm for Partitionable Asynchronous Distributed Systems
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Concurrency and distribution in object-oriented programming
ACM Computing Surveys (CSUR)
Coyote: a system for constructing fine-grain configurable communication services
ACM Transactions on Computer Systems (TOCS)
Building reliable, high-performance communication systems from components
Proceedings of the seventeenth ACM symposium on Operating systems principles
Replication and fault-tolerance in the ISIS system
Proceedings of the tenth ACM symposium on Operating systems principles
A review of experiences with reliable multicast
Software—Practice & Experience
Concurrency Control in Distributed Database Systems
ACM Computing Surveys (CSUR)
Guardians and Actions: Linguistic Support for Robust, Distributed Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Group communication specifications: a comprehensive study
ACM Computing Surveys (CSUR)
An inheritance-based technique for building simulation proofs incrementally
ACM Transactions on Software Engineering and Methodology (TOSEM)
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Reliable Distributed Computing with the ISIS Toolkit
Reliable Distributed Computing with the ISIS Toolkit
A Network Protocol Stack in Standard ML
Higher-Order and Symbolic Computation
SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
Determining the last process to fail
PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
Primary Partition "Virtually-Synchronous Communication" harder than Consensus
WDAG '94 Proceedings of the 8th International Workshop on Distributed Algorithms
Consensus service: a modular approach for building agreement protocols in distributed systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Reliable Communication in the Presence of Failures
Reliable Communication in the Presence of Failures
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
The ensemble system
Linguistic Support for Distributed Programming Abstractions
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Towards Safe Distributed Application Development
Proceedings of the 26th International Conference on Software Engineering
Secure Spread: An Integrated Architecture for Secure Group Communication
IEEE Transactions on Dependable and Secure Computing
Type-based publish/subscribe: Concepts and experiences
ACM Transactions on Programming Languages and Systems (TOPLAS)
Paxos made live: an engineering perspective
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Sinfonia: a new paradigm for building scalable distributed systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Live Distributed Objects: Enabling the Active Web
IEEE Internet Computing
Implementing Fault-Tolerant Distributed Objects
IEEE Transactions on Software Engineering
Programming with Live Distributed Objects
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Quicksilver Scalable Multicast (QSM)
NCA '08 Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications
Distributed OSGi built over message-oriented middleware
Software—Practice & Experience
Hi-index | 0.00 |
In this chapter, we discuss a widely used fault-tolerant data replication model called virtual synchrony. The model responds to two kinds of needs. First, there is the practical question of how best to embed replication into distributed systems. Virtual synchrony defines dynamic process groups that have self-managed membership. Applications can join or leave groups at will: a process group is almost like a replicated variable that lives in the network. The second need relates to performance. Although state machine replication is relatively easy to understand, protocols that implement state machine replication in the standard manner are too slow to be useful in demanding settings, and are hard to deploy in very large data centers of the sort seen in today's cloud-computing environments. Virtual synchrony implementations, in contrast, are able to deliver updates at the same data rates (and with the same low latencies) as IP multicast: the fast (but unreliable) Internet multicast protocol, often supported directly by hardware. The trick that makes it possible to achieve these very high levels of performance is to hide overheads by piggybacking extra information on regular messages that carry updates. The virtual synchrony replication model has been very widely adopted, and was used in everything from air traffic control and stock market systems to data center management platforms marketed by companies like IBM and Microsoft. Moreover, in recent years, state machine protocols such as those used in support of Paxos have begun to include elements of the virtual synchrony model, such as self-managed and very dynamic membership. Our exploration of the model takes the form of a history. We start by exploring the background, and then follow evolution of the model over time.