A short introduction to failure detectors for asynchronous distributed systems

Authors:
Michel Reynal
Affiliations:
IRISA, Rennes Cedex, France,
Venue:
ACM SIGACT News
Year:
2005

Citing 44
Cited 20

Effects of message loss on the termination of distributed protocols

Information Processing Letters
Sharing memory robustly in message-passing systems

Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
Reducing &OHgr; to ◊W

Information Processing Letters
Fault-tolerant broadcasts and related problems

Distributed systems (2nd Ed.)
Failure Detection and Randomization: A Hybrid Approach to Solve Consensus

SIAM Journal on Computing
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
Indulgent algorithms (preliminary version)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors

IEEE Transactions on Parallel and Distributed Systems
On Quiescent Reliable Communication

SIAM Journal on Computing
Distributed computing: fundamentals, simulations and advanced topics

Distributed computing: fundamentals, simulations and advanced topics
On scalable and efficient distributed failure detectors

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
On the Quality of Service of Failure Detectors

IEEE Transactions on Computers
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors

IEEE Transactions on Computers
An introduction to oracles for asynchronous distributed systems

Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Failure Detection Lower Bounds on Registers and Consensus

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
On the Weakest Failure Detector for Non-Blocking Atomic Commit

TCS '02 Proceedings of the IFIP 17th World Computer Congress - TC1 Stream / 2nd IFIP International Conference on Theoretical Computer Science: Foundations of Information Technology in the Era of Networking and Mobile Computing
The Best of Both Worlds: A Hybrid Approach to Solve Consensus

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Implementation and Performance Evaluation of an Adaptable Failure Detector

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Realistic Look At Failure Detectors

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Versatile and Modular Consensus Protoco

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Thrifty Generic Broadcast

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Low cost consensus-based Atomic Broadcast

PRDC '00 Proceedings of the 2000 Pacific Rim International Symposium on Dependable Computing
Optimal Implementation of the Weakest Failure Detector for Solving Consensus

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
On implementing omega with weak reliability and synchrony assumptions

Proceedings of the twenty-second annual symposium on Principles of distributed computing
On Classes of Problems in Asynchronous Distributed Systems with Process Crashes

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
An Adaptive Failure Detection Protocol

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
Conditions on input vectors for consensus solvability in asynchronous distributed systems

Journal of the ACM (JACM)
Non-blocking atomic commit in asynchronous distributed systems with failure detectors

Distributed Computing
The Information Structure of Indulgent Consensus

IEEE Transactions on Computers
A Hybrid Approach for Building Eventually Accurate Failure Detectors

PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
A weakest failure detector-based asynchronous consensus protocol for f

Information Processing Letters
Communication-efficient leader election and consensus with limited link synchrony

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
The weakest failure detectors to solve certain fundamental problems in distributed computing

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Crash-Resilient Time-Free Eventual Leadership

SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
A simple and fast asynchronous consensus protocol based on a weak failure detector

Distributed Computing
Early consensus in an asynchronous system with a weak failure detector

Distributed Computing
Handling message semantics with Generic Broadcast protocols

Distributed Computing
Early stopping in Global Data Computation

IEEE Transactions on Parallel and Distributed Systems

Travelling through wormholes: a new look at distributed systems models

ACM SIGACT News
Irreducibility and additivity of set agreement-oriented failure detector classes

Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
The notion of a timed register and its application to indulgent synchronization

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Asynchronous Agreement and Its Relation with Error-Correcting Codes

IEEE Transactions on Computers
Design and Performance Evaluation of Efficient Consensus Protocols for Mobile Ad Hoc Networks

IEEE Transactions on Computers
An approach to beacons detection for a mobile robot using a neural network model

MOAS'07 Proceedings of the 18th conference on Proceedings of the 18th IASTED International Conference: modelling and simulation
A methodology to design arbitrary failure detectors for distributed protocols

Journal of Systems Architecture: the EUROMICRO Journal
An impossibility about failure detectors in the iterated immediate snapshot model

Information Processing Letters
An approach to beacons detection for a mobile robot using a neural network model

MS '07 The 18th IASTED International Conference on Modelling and Simulation
From an intermittent rotating star to a leader

OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Skip ring topology in fast failure detection service

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
When consensus meets self-stabilization

Journal of Computer and System Sciences
The failure detector abstraction

ACM Computing Surveys (CSUR)
FaDe: RESTful service for failure detection in SOA environment

PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Experimental evaluation of a failure detection service based on a gossip strategy

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
When consensus meets self-stabilization

OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Failure detection in a RESTful way

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Looking for a definition of dynamic distributed systems

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
From unreliable objects to reliable objects: the case of atomic registers and consensus

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Automatic classification of eventual failure detectors

DISC'07 Proceedings of the 21st international conference on Distributed Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Since the first version of Chandra and Toueg's seminal paper titled "Unreliable failure detectors for reliable distributed systems" in 1991, the failure detector concept has been extensively studied and investigated. This is not at all surprising as failure detection is pervasive in the design, the analysis and the implementation of a lot of fault-tolerant distributed algorithms that constitute the core of distributed system middleware.The literature on this topic is mostly technical and appears mainly in theoretically inclined journals and conferences. The aim of this paper is to offer an introductory survey to the failure detector concept for readers who are not familiar with it and want to quickly understand its aim, its basic principles, its power and limitations. To attain this goal, the paper first describes the motivations that underlie the concept, and then surveys several distributed computing problems showing how they can be solved with the help of an appropriate failure detector. So, this short paper presents motivations, concepts, problems, definitions, and algorithms. It does not contain proofs. It is aimed at people who want to understand basics of failure detectors.