Crash-quiescent failure detection

Authors:
Srikanth Sastry;Scott M. Pike;Jennifer L. Welch
Affiliations:
Department of Computer Science and Engineering, Texas A&M University, College Station, TX;Department of Computer Science and Engineering, Texas A&M University, College Station, TX;Department of Computer Science and Engineering, Texas A&M University, College Station, TX
Venue:
DISC'09 Proceedings of the 23rd international conference on Distributed computing
Year:
2009

Citing 14
Cited 1

Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
On Quiescent Reliable Communication

SIAM Journal on Computing
An introduction to oracles for asynchronous distributed systems

Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
Stable Leader Election

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Implementation and Performance Evaluation of an Adaptable Failure Detector

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
An Adaptive Failure Detection Protocol

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems

IEEE Transactions on Computers
Mutual exclusion in asynchronous systems with failure detectors

Journal of Parallel and Distributed Computing
On the Possibility of Consensus in Asynchronous Systems with Finite Average Response Times

ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Crash fault detection in celerating environments

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Wait-free dining under eventual weak exclusion

ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Communication-efficient implementation of failure detector classes ♦;Q and ♦;P

DISC'05 Proceedings of the 19th international conference on Distributed Computing
Eventually perfect failure detectors using ADD channels

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Communication-efficient and crash-quiescent Omega with unknown membership

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

A distributed algorithm is crash quiescent if it eventually stops sending messages to crashed processes. An algorithm can be made crash quiescent by providing it with either a crash notification service or a reliable communication service. Both services can be implemented in practical environments with failure detectors. Therefore, crash-quiescent failure detection is fundamental to system-wide crash quiescence. We establish necessary and sufficient conditions for crash-quiescent failure detection in partially synchronous environments where a bounded, but unknown, number of consecutive messages can be arbitrarily late or lost. Without a correct majority of processes, not even the weakest oracle for fault-tolerant consensus, ⋄W, can be implemented crash quiescently. With a correct majority, however, the eventually perfect failure detector, ⋄P, is possible. Our ⋄P algorithm is correct in all runs, but improves performance via crash quiescence in any run with a correct majority. We also present a refinement of our ⋄P algorithm to mitigate the overhead of achieving crash quiescence; the resulting bit complexity per utilized link is asymptotically better than or equal to that of non-crash-quiescent counterparts.