Specifying and using a partitionable group communication service

Authors:
Alan Fekete;Nancy Lynch;Alex Shvartsman
Affiliations:
Dept. of Computer Science, University of Sydney, Sydney, Australia;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Dept. Computer Science and Engineering, University of Connecticut, Storrs, CT and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2001

Citing 33
Cited 25

Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Lightweight causal and atomic group multicast

ACM Transactions on Computer Systems (TOCS)
The Totem single-ring ordering and membership protocol

ACM Transactions on Computer Systems (TOCS)
Forward and backward simulations I.: untimed systems

Information and Computation
Totem: a fault-tolerant multicast group communication system

Communications of the ACM
The Transis approach to high availability cluster communication

Communications of the ACM
Horus: a flexible group communication system

Communications of the ACM
Synchronous and asynchronous

Communications of the ACM
Forward and backward simulations II.: timing-based systems

Information and Computation
Efficient message ordering in dynamic networks

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
On the impossibility of group membership

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
A new look at membership services (extended abstract)

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
A dynamic view-oriented group communication service

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Building adaptive systems using ensemble

Software—Practice & Experience - Special issue on multiprocessor operating systems
A review of experiences with reliable multicast

Software—Practice & Experience
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Building Secure and Reliable Network Applications

Building Secure and Reliable Network Applications
Distributed Algorithms

Distributed Algorithms
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
The Inherent Cost of Strong-Partial View-Synchronous Communication

WDAG '95 Proceedings of the 9th International Workshop on Distributed Algorithms
Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service

DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
A High Performance Totally Ordered Multicast Protocol

Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
Group, majority, and strict agreement in timed asynchronous distributed systems

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Fast Replicated State Machines Over Partitionable Networks

SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
Understanding Partitions and the ``No Partition'''' Assumption

Understanding Partitions and the ``No Partition'''' Assumption
Optimizing Layered Communication Protocols

Optimizing Layered Communication Protocols
Newtop: a fault-tolerant group communication protocol

ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Implementing sequentially consistent shared objects using broadcast and point-to-point communication

ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Group Communication in Partitionable Systems: Specification and Algorithms

Group Communication in Partitionable Systems: Specification and Algorithms
Group Membership and View Synchrony in Partitionable Asynchronous Distributed Systems: Specifications

Group Membership and View Synchrony in Partitionable Asynchronous Distributed Systems: Specifications
Group-Enhanced Remote Method Invocations

Group-Enhanced Remote Method Invocations
The ensemble system

The ensemble system

Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
Moshe: A group membership service for WANs

ACM Transactions on Computer Systems (TOCS)
Active disk paxos with infinitely many processes

Proceedings of the twenty-first annual symposium on Principles of distributed computing
RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Hybrid I/O automata

Information and Computation
Synthesis of fault-tolerant concurrent programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cooperative computing with fragmentable and mergeable groups

Journal of Discrete Algorithms
Secure Group Communication Using Robust Contributory Key Agreement

IEEE Transactions on Parallel and Distributed Systems
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks

IEEE Transactions on Mobile Computing
Group membership: a novel approach and the first single-round algorithm

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Total order broadcast and multicast algorithms: Taxonomy and survey

ACM Computing Surveys (CSUR)
Using Leader-Based Communication to Improve the Scalability of Single-Round Group Membership Algorithms

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
From Set Membership to Group Membership: A Separation of Concerns

IEEE Transactions on Dependable and Secure Computing
Random Walk for Self-Stabilizing Group Communication in Ad Hoc Networks

IEEE Transactions on Mobile Computing
Active disk Paxos with infinitely many processes

Distributed Computing - Special issue: PODC 02
A distributed multi-party key agreement protocol for dynamic collaborative groups using ECC

Journal of Parallel and Distributed Computing - 19th International parallel and distributed processing symposium
Dynamic load balancing with group communication

Theoretical Computer Science
Formal Development of a Total Order Broadcast for Distributed Transactions Using Event-B

Methods, Models and Tools for Fault Tolerance
A distributed algorithm for simultaneous group communication

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Modeling A Certified Email Protocol using I/O Automata

Electronic Notes in Theoretical Computer Science (ENTCS)
Rigorous analysis of byzantine causal order using Event-B

Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Reconfiguring a state machine

ACM SIGACT News
Practical impact of group communication theory

Future directions in distributed computing
Communication and data sharing for dynamic distributed systems

Future directions in distributed computing
A distributed algorithm for ordered, atomic and simultaneous group communication

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Group communication services are becoming accepted as effective building blocks for the construction of fault-tolerant distributed applications. Many specifications for group communication services have been proposed. However, there is still no agreement about what these specifications should say, especially in cases where the services are partitionable, i.e., where communication failures may lead to simultaneous creation of groups with disjoint memberships, such that each group is unware of the existence of any other group. In this paper, we present a new, succinct specification for a view-oriented partitionable group communication service. The service associates each message with a particular view of the group membership. All send and receive events for a message occur within the associated view. The service provides a total order on the messages within each view, and each processor receives a prefix of this order. Our specification separates safety requirements from performance and fault-tolerance requirements. The safety requirements are expressed by an abstract, global state machine. To present the performance and fault-tolerance requirements, we include failure-status input actions in the specification; we then give properties saying that consensus on the view and timely message delivery are guaranteed in an execution provided that the execution stabilizes to a situation in which the failure-status stops changing and corresponds to consistently partioned system. Because consensus is not required in every execution, the specification is not subject to the existing impossibility results for partionable systems. Our specification has a simple implementation, based on the membership algorithm of Christian and Schmuck. We show the utility of the specification by constructing an ordered-broadcast application, using an algorithm (based on algorithms of Amir, Dolev, Keidar, and others) that reconciles information derived from different instantiations of the group. The application manages the view-change activity to build a shared sequence of messages, i.e., the per-view total orders of the group service are combined to give a universal total order. We prove the correctness and analyze the performance and fault-tolerance of the resulting application.