Solving Atomic Multicast When Groups Crash

Authors:
Nicolas Schiper;Fernando Pedone
Affiliations:
University of Lugano, Switzerland;University of Lugano, Switzerland
Venue:
OPODIS '08 Proceedings of the 12th International Conference on Principles of Distributed Systems
Year:
2008

Citing 18
Cited 1

Reliable communication in the presence of failures

ACM Transactions on Computer Systems (TOCS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
Fault-tolerant broadcasts and related problems

Distributed systems (2nd Ed.)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Genuine atomic multicast in asynchronous distributed systems

Theoretical Computer Science
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
Revising the Weakest Failure Detector for Uniform Reliable Broadcast

Proceedings of the 13th International Symposium on Distributed Computing
A Realistic Look At Failure Detectors

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Thrifty Generic Broadcast

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Fault-Tolerant Total Order Multicast to Asynchronous Groups

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Scalable Atomic Multicast

IC3N '98 Proceedings of the International Conference on Computer Communications and Networks
Transactions on Partially Replicated Data based on Reliable and Atomic Multicasts

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Handling message semantics with Generic Broadcast protocols

Distributed Computing
Total order broadcast and multicast algorithms: Taxonomy and survey

ACM Computing Surveys (CSUR)
On the inherent cost of atomic broadcast and multicast in wide area networks

ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Early stopping in Global Data Computation

IEEE Transactions on Parallel and Distributed Systems

Exploiting partitioned synchrony to implement accurate failure detectors

International Journal of Critical Computer-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the atomic multicast problem, a fundamental abstraction for building fault-tolerant systems. In our model, processes are divided into non-empty and disjoint groups . Multicast messages may be addressed to any subset of groups, each message possibly being multicast to a different subset. Several papers previously studied this problem either in local area networks [1,2,3] or wide area networks [4,5]. However, none of them considered atomic multicast when groups may crash. We present two atomic multicast algorithms that tolerate the crash of groups. The first algorithm tolerates an arbitrary number of failures, is genuine (i.e., to deliver a message m , only addressees of m are involved in the protocol), and uses the perfect failures detector $\mathcal{P}$. We show that among realistic failure detectors, i.e., those that do not predict the future, $\mathcal{P}$ is necessary to solve genuine atomic multicast if we do not bound the number of processes that may fail. Thus, $\mathcal{P}$ is the weakest realistic failure detector for solving genuine atomic multicast when an arbitrary number of processes may crash. Our second algorithm is non-genuine and less resilient to process failures than the first algorithm but has several advantages: (i) it requires perfect failure detection within groups only, and not across the system, (ii) as we show in the paper it can be modified to rely on unreliable failure detection at the cost of a weaker liveness guarantee, and (iii) it is fast, messages addressed to multiple groups may be delivered within two inter-group message delays only.