Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
On Quiescent Reliable Communication
SIAM Journal on Computing
An introduction to oracles for asynchronous distributed systems
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
An Adaptive Failure Detection Protocol
PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems
IEEE Transactions on Computers
Mutual exclusion in asynchronous systems with failure detectors
Journal of Parallel and Distributed Computing
On the Possibility of Consensus in Asynchronous Systems with Finite Average Response Times
ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Crash fault detection in celerating environments
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Wait-free dining under eventual weak exclusion
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Communication-efficient implementation of failure detector classes ♦;Q and ♦;P
DISC'05 Proceedings of the 19th international conference on Distributed Computing
Eventually perfect failure detectors using ADD channels
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Communication-efficient and crash-quiescent Omega with unknown membership
Information Processing Letters
Hi-index | 0.00 |
A distributed algorithm is crash quiescent if it eventually stops sending messages to crashed processes. An algorithm can be made crash quiescent by providing it with either a crash notification service or a reliable communication service. Both services can be implemented in practical environments with failure detectors. Therefore, crash-quiescent failure detection is fundamental to system-wide crash quiescence. We establish necessary and sufficient conditions for crash-quiescent failure detection in partially synchronous environments where a bounded, but unknown, number of consecutive messages can be arbitrarily late or lost. Without a correct majority of processes, not even the weakest oracle for fault-tolerant consensus, ⋄W, can be implemented crash quiescently. With a correct majority, however, the eventually perfect failure detector, ⋄P, is possible. Our ⋄P algorithm is correct in all runs, but improves performance via crash quiescence in any run with a correct majority. We also present a refinement of our ⋄P algorithm to mitigate the overhead of achieving crash quiescence; the resulting bit complexity per utilized link is asymptotically better than or equal to that of non-crash-quiescent counterparts.