Effects of message loss on the termination of distributed protocols
Information Processing Letters
Sharing memory robustly in message-passing systems
Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The weakest failure detector for solving consensus
Journal of the ACM (JACM)
Information Processing Letters
Fault-tolerant broadcasts and related problems
Distributed systems (2nd Ed.)
Failure Detection and Randomization: A Hybrid Approach to Solve Consensus
SIAM Journal on Computing
Reaching Agreement in the Presence of Faults
Journal of the ACM (JACM)
Indulgent algorithms (preliminary version)
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors
IEEE Transactions on Parallel and Distributed Systems
On Quiescent Reliable Communication
SIAM Journal on Computing
Distributed computing: fundamentals, simulations and advanced topics
Distributed computing: fundamentals, simulations and advanced topics
On scalable and efficient distributed failure detectors
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors
IEEE Transactions on Computers
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors
IEEE Transactions on Computers
An introduction to oracles for asynchronous distributed systems
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Failure Detection Lower Bounds on Registers and Consensus
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
On the Weakest Failure Detector for Non-Blocking Atomic Commit
TCS '02 Proceedings of the IFIP 17th World Computer Congress - TC1 Stream / 2nd IFIP International Conference on Theoretical Computer Science: Foundations of Information Technology in the Era of Networking and Mobile Computing
The Best of Both Worlds: A Hybrid Approach to Solve Consensus
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Realistic Look At Failure Detectors
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Versatile and Modular Consensus Protoco
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols
PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Low cost consensus-based Atomic Broadcast
PRDC '00 Proceedings of the 2000 Pacific Rim International Symposium on Dependable Computing
Optimal Implementation of the Weakest Failure Detector for Solving Consensus
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
On implementing omega with weak reliability and synchrony assumptions
Proceedings of the twenty-second annual symposium on Principles of distributed computing
On Classes of Problems in Asynchronous Distributed Systems with Process Crashes
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
An Adaptive Failure Detection Protocol
PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
Conditions on input vectors for consensus solvability in asynchronous distributed systems
Journal of the ACM (JACM)
Non-blocking atomic commit in asynchronous distributed systems with failure detectors
Distributed Computing
The Information Structure of Indulgent Consensus
IEEE Transactions on Computers
A Hybrid Approach for Building Eventually Accurate Failure Detectors
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
A weakest failure detector-based asynchronous consensus protocol for f
Information Processing Letters
Communication-efficient leader election and consensus with limited link synchrony
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
The weakest failure detectors to solve certain fundamental problems in distributed computing
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Crash-Resilient Time-Free Eventual Leadership
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
A simple and fast asynchronous consensus protocol based on a weak failure detector
Distributed Computing
Early consensus in an asynchronous system with a weak failure detector
Distributed Computing
Handling message semantics with Generic Broadcast protocols
Distributed Computing
Early stopping in Global Data Computation
IEEE Transactions on Parallel and Distributed Systems
Irreducibility and additivity of set agreement-oriented failure detector classes
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
The notion of a timed register and its application to indulgent synchronization
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Asynchronous Agreement and Its Relation with Error-Correcting Codes
IEEE Transactions on Computers
Design and Performance Evaluation of Efficient Consensus Protocols for Mobile Ad Hoc Networks
IEEE Transactions on Computers
An approach to beacons detection for a mobile robot using a neural network model
MOAS'07 Proceedings of the 18th conference on Proceedings of the 18th IASTED International Conference: modelling and simulation
A methodology to design arbitrary failure detectors for distributed protocols
Journal of Systems Architecture: the EUROMICRO Journal
An impossibility about failure detectors in the iterated immediate snapshot model
Information Processing Letters
An approach to beacons detection for a mobile robot using a neural network model
MS '07 The 18th IASTED International Conference on Modelling and Simulation
From an intermittent rotating star to a leader
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Skip ring topology in fast failure detection service
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
When consensus meets self-stabilization
Journal of Computer and System Sciences
The failure detector abstraction
ACM Computing Surveys (CSUR)
FaDe: RESTful service for failure detection in SOA environment
PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Experimental evaluation of a failure detection service based on a gossip strategy
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
When consensus meets self-stabilization
OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Failure detection in a RESTful way
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Looking for a definition of dynamic distributed systems
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
From unreliable objects to reliable objects: the case of atomic registers and consensus
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Automatic classification of eventual failure detectors
DISC'07 Proceedings of the 21st international conference on Distributed Computing
Hi-index | 0.01 |
Since the first version of Chandra and Toueg's seminal paper titled "Unreliable failure detectors for reliable distributed systems" in 1991, the failure detector concept has been extensively studied and investigated. This is not at all surprising as failure detection is pervasive in the design, the analysis and the implementation of a lot of fault-tolerant distributed algorithms that constitute the core of distributed system middleware.The literature on this topic is mostly technical and appears mainly in theoretically inclined journals and conferences. The aim of this paper is to offer an introductory survey to the failure detector concept for readers who are not familiar with it and want to quickly understand its aim, its basic principles, its power and limitations. To attain this goal, the paper first describes the motivations that underlie the concept, and then surveys several distributed computing problems showing how they can be solved with the help of an appropriate failure detector. So, this short paper presents motivations, concepts, problems, definitions, and algorithms. It does not contain proofs. It is aimed at people who want to understand basics of failure detectors.