The failure detector abstraction

Authors:
Felix C. Freiling;Rachid Guerraoui;Petr Kuznetsov
Affiliations:
University of Mannheim, Mannheim, Germany;EPFL;TU Berlin/Deutsche Telekom Laboratories
Venue:
ACM Computing Surveys (CSUR)
Year:
2011

Citing 92
Cited 1

On the minimal synchronism needed for distributed consensus

Journal of the ACM (JACM)
Concurrency control and recovery in database systems

Concurrency control and recovery in database systems
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Parallel program design: a foundation

Parallel program design: a foundation
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems

PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Renaming in an asynchronous environment

Journal of the ACM (JACM)
Agreement is harder than consensus: set consensus problems in totally asynchronous systems

PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
The Many Faces of Consensus in Distributed Systems

Computer
The consensus problem in fault-tolerant computing

ACM Computing Surveys (CSUR)
Generalized FLP impossibility result for t-resilient asynchronous computations

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Sharing memory robustly in message-passing systems

Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Failure detectors and the wait-free hierarchy (extended abstract)

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
Computer networks (3rd ed.)

Computer networks (3rd ed.)
Failure detectors in omission failure environments

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Reducing &OHgr; to ◊W

Information Processing Letters
What good are models and what models are good?

Distributed systems (2nd Ed.)
Failure Detection and Randomization: A Hybrid Approach to Solve Consensus

SIAM Journal on Computing
The Timed Asynchronous Distributed System Model

IEEE Transactions on Parallel and Distributed Systems
Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks

Theoretical Computer Science
The topological structure of asynchronous computability

Journal of the ACM (JACM)
Self-stabilization

Self-stabilization
Indulgent algorithms (preliminary version)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Wait-Free k-Set Agreement is Impossible: The Topology of Public Knowledge

SIAM Journal on Computing
On Quiescent Reliable Communication

SIAM Journal on Computing
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Self-stabilizing systems in spite of distributed control

Communications of the ACM
Fast Asynchronous Uniform Consensus in Real-Time Distributed Systems

IEEE Transactions on Computers
Muteness Failure Detectors: Specification and Implementation

EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
Using Failure Detectors to Solve Consensus in Asynchronous Sharde-Memory Systems (Extended Abstract)

WDAG '94 Proceedings of the 8th International Workshop on Distributed Algorithms
"Gamma-Accurate" Failure Detectors

WDAG '96 Proceedings of the 10th International Workshop on Distributed Algorithms
Genuine Atomic Multicast

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Failure Detection and Consensus in the Crash-Recovery Model

DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Stable Leader Election

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Failure Detection Sequencers: Necessary and Sufficient Information about Failures to Solve Predicate Detection

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Encapsulating Failure Detection: From Crash to Byzantine Failures

Ada-Europe '02 Proceedings of the 7th Ada-Europe International Conference on Reliable Software Technologies
Implementable Failure Detectors in Asynchronous Systems

Proceedings of the 18th Conference on Foundations of Software Technology and Theoretical Computer Science
(Im)Possibilities of Predicate Detection in Crash-Affected Systems

WSS '01 Proceedings of the 5th International Workshop on Self-Stabilizing Systems
Consensus in Asynchronous Distributed Systems: A Concise Guided Tour

Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Synchronous System and Perfect Failure Detector: Solvability and Efficiency Issue

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
On the Quality of Service of Failure Detectors

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Generic Broadcast

Proceedings of the 13th International Symposium on Distributed Computing
Thrifty Generic Broadcast

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Unreliable Intrusion Detection in Distributed Computations

CSFW '97 Proceedings of the 10th IEEE workshop on Computer Security Foundations
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Consensus in Synchronous Systems: A Concise Guided Tour

PRDC '02 Proceedings of the 2002 Pacific Rim International Symposium on Dependable Computing
Experiences with NIMI

SAINT-W '02 Proceedings of the 2002 Symposium on Applications and the Internet (SAINT) Workshops
Consensus in Asynchronous Systems Where Processes Can Crash and Recover

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Optimal Implementation of the Weakest Failure Detector for Solving Consensus

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
Consistent Detection of Global Predicates under a Weak Fault Assumption

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
Distributed Predicate Detection in a Faulty Environment

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Detectors and Correctors: A Theory of Fault-Tolerance Components

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
A Modular Approach to Fault-Tolerant Broadcasts and Related Problems

A Modular Approach to Fault-Tolerant Broadcasts and Related Problems
Election Vs. Consensus in Asynchronous Systems

Election Vs. Consensus in Asynchronous Systems
On implementing omega with weak reliability and synchrony assumptions

Proceedings of the twenty-second annual symposium on Principles of distributed computing
Non-blocking atomic commit in asynchronous distributed systems with failure detectors

Distributed Computing
Distributed Computing: Fundamentals, Simulations and Advanced Topics

Distributed Computing: Fundamentals, Simulations and Advanced Topics
The weakest failure detectors to solve certain fundamental problems in distributed computing

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Failure detection and consensus in the crash-recovery model

Distributed Computing
A simple and fast asynchronous consensus protocol based on a weak failure detector

Distributed Computing
Early consensus in an asynchronous system with a weak failure detector

Distributed Computing
Erratum: early consensus in an asynchronous system with a weak failure detector

Distributed Computing
A short introduction to failure detectors for asynchronous distributed systems

ACM SIGACT News
Mutual exclusion in asynchronous systems with failure detectors

Journal of Parallel and Distributed Computing
On the Possibility of Consensus in Asynchronous Systems with Finite Average Response Times

ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Bounded time-stamps

Distributed Computing
Illustrating the impossibility of crash-tolerant consensus in asynchronous systems

ACM SIGOPS Operating Systems Review
On the weakest failure detector ever

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
PeerReview: practical accountability for distributed systems

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The gap in circumventing the impossibility of consensus

Journal of Computer and System Sciences
Atomic shared register access by asynchronous hardware

SFCS '86 Proceedings of the 27th Annual Symposium on Foundations of Computer Science
Anti-Ω: the weakest failure detector for set agreement

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Failure detectors in loosely named systems

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Every problem has a weakest failure detector

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
The Weakest Failure Detector for Message Passing Set-Agreement

DISC '08 Proceedings of the 22nd international symposium on Distributed Computing
In search of the holy grail: looking for the weakest failure detector for wait-free set agreement

OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
On the possibility and the impossibility of message-driven self-stabilizing failure detection

SSS'05 Proceedings of the 7th international conference on Self-Stabilizing Systems
Exploring gafni's reduction land: from Ωko Wait-Free Adaptive (2p - ⌈p/k⌉)-Renaming Via k-Set Agreemen

DISC'06 Proceedings of the 20th international conference on Distributed Computing
Revisiting failure detection and consensus in omission failure environments

ICTAC'05 Proceedings of the Second international conference on Theoretical Aspects of Computing
On conspiracies and hyperfairness in distributed computing

DISC'05 Proceedings of the 19th international conference on Distributed Computing
Efficient reduction for wait-free termination detection in a crash-prone distributed system

DISC'05 Proceedings of the 19th international conference on Distributed Computing
Implementing reliable distributed real-time systems with the Θ-model

OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems
Automatic classification of eventual failure detectors

DISC'07 Proceedings of the 21st international conference on Distributed Computing

Failure detection in a RESTful way

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A failure detector is a fundamental abstraction in distributed computing. This article surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illustrate how failure detectors can be used to classify problems. We also highlight some limitations of the failure detector abstraction along each of the dimensions.