On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems

Authors:
M. Larrea;A. Fernandez;S. Arevalo
Affiliations:
Dept. de Arquitectura y Tecnologia de Comput., Pais Vasco Univ., San Sebastian, Spain;-;-
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 19
Cited 10

On the minimal synchronism needed for distributed consensus

Journal of the ACM (JACM)
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
Structured derivations of consensus algorithms for failure detectors

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
Restricted failure detectors: definition and reduction protocols

Information Processing Letters
k-set agreement with limited accuracy failure detectors

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
On scalable and efficient distributed failure detectors

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors

IEEE Transactions on Computers
"Gamma-Accurate" Failure Detectors

WDAG '96 Proceedings of the 10th International Workshop on Distributed Algorithms
Stable Leader Election

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Implementation and Performance Evaluation of an Adaptable Failure Detector

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Realistic Look At Failure Detectors

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Non blocking atomic commitment with an unreliable failure detector

SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
Optimal Implementation of the Weakest Failure Detector for Solving Consensus

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
An Adaptive Failure Detection Protocol

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
A necessary and sufficient condition for transforming limited accuracy failure detectors

Journal of Computer and System Sciences

Design and Performance Evaluation of Efficient Consensus Protocols for Mobile Ad Hoc Networks

IEEE Transactions on Computers
On termination detection in crash-prone distributed systems with failure detectors

Journal of Parallel and Distributed Computing
Using asynchrony and zero degradation to speed up indulgent consensus protocols

Journal of Parallel and Distributed Computing
Safe termination detection in an asynchronous distributed system when processes may crash and recover

Theoretical Computer Science
Implementing the Omega failure detector in the crash-recovery failure model

Journal of Computer and System Sciences
Crash-quiescent failure detection

DISC'09 Proceedings of the 23rd international conference on Distributed computing
Safe termination detection in an asynchronous distributed system when processes may crash and recover

OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Failure detection with booting in partially synchronous systems

EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Efficient reduction for wait-free termination detection in a crash-prone distributed system

DISC'05 Proceedings of the 19th international conference on Distributed Computing
Eventually perfect failure detectors using ADD channels

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	14.99

Visualization

Abstract

Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide information about process failures. Chandra and Toueg defined eight classes of failure detectors, depending on how accurate this information is, and presented an algorithm implementing a failure detector of one of these classes in a partially synchronous system. This algorithm is based on all--to-all communication and periodically exchanges a number of messages that is quadratic on the number of processes. In this paper, we study the implementability of different classes of failure detectors in several models of partial synchrony. We first show that no failure detector with perpetual accuracy (namely, \cal P, \cal Q, \cal S, and \cal W) can be implemented in these models in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of the class \Theta proposed by Aguilera et al. Then, we present a family of distributed algorithms that implement the four classes of unreliable failure detectors with eventual accuracy (namely, \diamond {\cal{P}}, \diamond {\cal{Q}}, \diamond {\cal{S}}, and \diamond {\cal{W}}). Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages.