On the minimal synchronism needed for distributed consensus
Journal of the ACM (JACM)
Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The weakest failure detector for solving consensus
Journal of the ACM (JACM)
Structured derivations of consensus algorithms for failure detectors
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Reaching Agreement in the Presence of Faults
Journal of the ACM (JACM)
Restricted failure detectors: definition and reduction protocols
Information Processing Letters
k-set agreement with limited accuracy failure detectors
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
On scalable and efficient distributed failure detectors
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
"Gamma-Accurate" Failure Detectors
WDAG '96 Proceedings of the 10th International Workshop on Distributed Algorithms
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Realistic Look At Failure Detectors
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Non blocking atomic commitment with an unreliable failure detector
SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
Optimal Implementation of the Weakest Failure Detector for Solving Consensus
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
An Adaptive Failure Detection Protocol
PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
A necessary and sufficient condition for transforming limited accuracy failure detectors
Journal of Computer and System Sciences
Design and Performance Evaluation of Efficient Consensus Protocols for Mobile Ad Hoc Networks
IEEE Transactions on Computers
On termination detection in crash-prone distributed systems with failure detectors
Journal of Parallel and Distributed Computing
Using asynchrony and zero degradation to speed up indulgent consensus protocols
Journal of Parallel and Distributed Computing
Theoretical Computer Science
Implementing the Omega failure detector in the crash-recovery failure model
Journal of Computer and System Sciences
Crash-quiescent failure detection
DISC'09 Proceedings of the 23rd international conference on Distributed computing
OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Failure detection with booting in partially synchronous systems
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Efficient reduction for wait-free termination detection in a crash-prone distributed system
DISC'05 Proceedings of the 19th international conference on Distributed Computing
Eventually perfect failure detectors using ADD channels
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 14.99 |
Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide information about process failures. Chandra and Toueg defined eight classes of failure detectors, depending on how accurate this information is, and presented an algorithm implementing a failure detector of one of these classes in a partially synchronous system. This algorithm is based on all--to-all communication and periodically exchanges a number of messages that is quadratic on the number of processes. In this paper, we study the implementability of different classes of failure detectors in several models of partial synchrony. We first show that no failure detector with perpetual accuracy (namely, \cal P, \cal Q, \cal S, and \cal W) can be implemented in these models in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of the class \Theta proposed by Aguilera et al. Then, we present a family of distributed algorithms that implement the four classes of unreliable failure detectors with eventual accuracy (namely, \diamond {\cal{P}}, \diamond {\cal{Q}}, \diamond {\cal{S}}, and \diamond {\cal{W}}). Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages.