Failure Detection vs Group Membership in Fault-Tolerant Distributed Systems: Hidden Trade-Offs

Authors:
André Schiper
Affiliations:
-
Venue:
PAPM-PROBMIV '02 Proceedings of the Second Joint International Workshop on Process Algebra and Probabilistic Methods, Performance Modeling and Verification
Year:
2002

Citing 26
Cited 1

On the minimal synchronism needed for distributed consensus

Journal of the ACM (JACM)
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Using process groups to implement failure detection in asynchronous environments

PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
On the impossibility of group membership

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Dynamic voting for consistent primary components

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Replication management using the state-machine approach

Distributed systems (2nd Ed.)
The primary-backup approach

Distributed systems (2nd Ed.)
The Timed Asynchronous Distributed System Model

IEEE Transactions on Parallel and Distributed Systems
Reliable broadcast protocols

ACM Transactions on Computer Systems (TOCS)
An Internet multicast system for the stock market

ACM Transactions on Computer Systems (TOCS)
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
Distributed Algorithms

Distributed Algorithms
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
Solving Agreement Problems with Weak Ordering Oracles

EDCC-4 Proceedings of the 4th European Dependable Computing Conference on Dependable Computing
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Semi-Passive Replication

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Broadcasting Messages in Fault-Tolerant Distributed Systems: The Benefit of Handling Input-Triggered and Output-Triggered Suspicions Differently

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
A Modular Approach to Fault-Tolerant Broadcasts and Related Problems

A Modular Approach to Fault-Tolerant Broadcasts and Related Problems
The Totem System

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
View Synchronous Communication in Large Scale Networks

View Synchronous Communication in Large Scale Networks
Early consensus in an asynchronous system with a weak failure detector

Distributed Computing
Randomized byzantine generals

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Semi-passive replication and Lazy Consensus

Journal of Parallel and Distributed Computing

Consistent Partial Model Checking

Electronic Notes in Theoretical Computer Science (ENTCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Failure detection and group membership are two important components of fault-tolerant distributed systems. Understanding their role is essential when developing efficient solutions, not only in failure-free runs, but also in runs in which processes do crash. While group membership provides consistent information about the status of processes in the system, failure detectors provide inconsistent information. This paper discusses the trade-offs related to the use of these two components, and clarifies their roles using three examples. The first example shows a case where group membership may favourably be replaced by a failure detection mechanism. The second example illustrates a case where group membership is mandatory. Finally, the third example shows a case where neither group membership nor failure detectors are needed (they may be replaced by weak ordering oracles).