Reliable communication in the presence of failures
ACM Transactions on Computer Systems (TOCS)
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The weakest failure detector for solving consensus
Journal of the ACM (JACM)
Fault-tolerant broadcasts and related problems
Distributed systems (2nd Ed.)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Genuine atomic multicast in asynchronous distributed systems
Theoretical Computer Science
Group communication specifications: a comprehensive study
ACM Computing Surveys (CSUR)
Revising the Weakest Failure Detector for Uniform Reliable Broadcast
Proceedings of the 13th International Symposium on Distributed Computing
A Realistic Look At Failure Detectors
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Fault-Tolerant Total Order Multicast to Asynchronous Groups
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
IC3N '98 Proceedings of the International Conference on Computer Communications and Networks
Transactions on Partially Replicated Data based on Reliable and Atomic Multicasts
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Handling message semantics with Generic Broadcast protocols
Distributed Computing
Total order broadcast and multicast algorithms: Taxonomy and survey
ACM Computing Surveys (CSUR)
On the inherent cost of atomic broadcast and multicast in wide area networks
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Early stopping in Global Data Computation
IEEE Transactions on Parallel and Distributed Systems
Exploiting partitioned synchrony to implement accurate failure detectors
International Journal of Critical Computer-Based Systems
Hi-index | 0.00 |
In this paper, we study the atomic multicast problem, a fundamental abstraction for building fault-tolerant systems. In our model, processes are divided into non-empty and disjoint groups . Multicast messages may be addressed to any subset of groups, each message possibly being multicast to a different subset. Several papers previously studied this problem either in local area networks [1,2,3] or wide area networks [4,5]. However, none of them considered atomic multicast when groups may crash. We present two atomic multicast algorithms that tolerate the crash of groups. The first algorithm tolerates an arbitrary number of failures, is genuine (i.e., to deliver a message m , only addressees of m are involved in the protocol), and uses the perfect failures detector $\mathcal{P}$. We show that among realistic failure detectors, i.e., those that do not predict the future, $\mathcal{P}$ is necessary to solve genuine atomic multicast if we do not bound the number of processes that may fail. Thus, $\mathcal{P}$ is the weakest realistic failure detector for solving genuine atomic multicast when an arbitrary number of processes may crash. Our second algorithm is non-genuine and less resilient to process failures than the first algorithm but has several advantages: (i) it requires perfect failure detection within groups only, and not across the system, (ii) as we show in the paper it can be modified to rely on unreliable failure detection at the cost of a weaker liveness guarantee, and (iii) it is fast, messages addressed to multiple groups may be delivered within two inter-group message delays only.