Failure detection with booting in partially synchronous systems

Authors:
Josef Widder;Gérard Le Lann;Ulrich Schmid
Affiliations:
Embedded Computing Systems Group E182/2, Technische Universität Wien, Vienna, Austria;INRIA Rocquencourt, Projet Novaltis, Le Chesnay Cedex, France;Embedded Computing Systems Group E182/2, Technische Universität Wien, Vienna, Austria
Venue:
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Year:
2005

Citing 20
Cited 10

On the minimal synchronism needed for distributed consensus

Journal of the ACM (JACM)
Optimal clock synchronization

Journal of the ACM (JACM)
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The weakest failure detector for solving consensus

Journal of the ACM (JACM)
Failure detectors in omission failure environments

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Fault-tolerant broadcasts and related problems

Distributed systems (2nd Ed.)
The Timed Asynchronous Distributed System Model

IEEE Transactions on Parallel and Distributed Systems
Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms

Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms
Fast Asynchronous Uniform Consensus in Real-Time Distributed Systems

IEEE Transactions on Computers
Muteness Failure Detectors: Specification and Implementation

EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
On Real-Time and Non Real-Time Distributed Computing

WDAG '95 Proceedings of the 9th International Workshop on Distributed Algorithms
Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes

WDAG '96 Proceedings of the 10th International Workshop on Distributed Algorithms
Encapsulating Failure Detection: From Crash to Byzantine Failures

Ada-Europe '02 Proceedings of the 7th Ada-Europe International Conference on Reliable Software Technologies
he Timely Computing Base: Timely Actions in the Presence of Uncertain Timeliness

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Unreliable Intrusion Detection in Distributed Computations

CSFW '97 Proceedings of the 10th IEEE workshop on Computer Security Foundations
On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems

IEEE Transactions on Computers
Failure detection and consensus in the crash-recovery model

Distributed Computing
On the impossibility of implementing perpetual failure detectors in partially synchronous systems

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing

Brief announcement: on the possibility and the impossibility of message-driven self-stabilizing failure detection

Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing
The asynchronous bounded-cycle model

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
A general characterization of indulgence

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Optimal message-driven implementations of omega with mute processes

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
The Asynchronous Bounded-Cycle Model

SSS '08 Proceedings of the 10th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Towards a real-time distributed computing model

Theoretical Computer Science
Optimal message-driven implementation of omega with mute processes

SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
The Asynchronous Bounded-Cycle model

Theoretical Computer Science
On the possibility and the impossibility of message-driven self-stabilizing failure detection

SSS'05 Proceedings of the 7th international conference on Self-Stabilizing Systems
Implementing reliable distributed real-time systems with the Θ-model

OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unreliable failure detectors are a well known means to enrich asynchronous distributed systems with time-free semantics that allow to solve consensus in the presence of crash failures. Implementing unreliable failure detectors requires a system that provides some synchrony, typically an upper bound on end-to-end message delays. Recently, we introduced an implementation of the perfect failure detector in a novel partially synchronous model, referred to as the Θ-Model, where only the ratio Θ of maximum vs. minimum end-to-end delay of messages that are simultaneously in transit must be known a priori (while the actual delays need not be known and not even be bounded). In this paper, we present an alternative failure detector algorithm, which is based on a clock synchronization algorithm for the Θ-Model. It not only surpasses our first implementation with respect to failure detection time, but also works during the system booting phase.