Exploiting partitioned synchrony to implement accurate failure detectors

Authors:
Raimundo José de Araújo Macêdo;Sérgio Gorender
Affiliations:
Distributed System Laboratory (LaSiD), Computer Science Department, Federal University of Bahia, Campus de Ondina, 40170-110, Salvador, Brazil.;Distributed System Laboratory (LaSiD), Computer Science Department, Federal University of Bahia, Campus de Ondina, 40170-110, Salvador, Brazil
Venue:
International Journal of Critical Computer-Based Systems
Year:
2012

Citing 19
Cited 0

On the minimal synchronism needed for distributed consensus

Journal of the ACM (JACM)
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Aperiodic servers in a deadline scheduling environment

Real-Time Systems
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
A survey of QoS architectures

Multimedia Systems
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Fault-tolerant broadcasts and related problems

Distributed systems (2nd Ed.)
The Timed Asynchronous Distributed System Model

IEEE Transactions on Parallel and Distributed Systems
Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors

IEEE Transactions on Parallel and Distributed Systems
Distributed Algorithms

Distributed Algorithms
Understanding perfect failure detectors

Proceedings of the twenty-first annual symposium on Principles of distributed computing
The Timely Computing Base Model and Architecture

IEEE Transactions on Computers
Perfect Failure Detection in Timed Asynchronous Systems

IEEE Transactions on Computers
Synchronous System and Perfect Failure Detector: Solvability and Efficiency Issue

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
A Hybrid and Adaptive Model for Fault-Tolerant Distributed Computing

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Travelling through wormholes: a new look at distributed systems models

ACM SIGACT News
An Adaptive Programming Model for Fault-Tolerant Distributed Computing

IEEE Transactions on Dependable and Secure Computing
Solving Atomic Multicast When Groups Crash

OPODIS '08 Proceedings of the 12th International Conference on Principles of Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We exploit the concept of partitioned synchrony to show that it is possible to implement accurate failure detectors in a non-synchronous distributed system. To realise that, we introduce the partitioned synchronous system (Spa) that is weaker than the conventional synchronous system. Based on some properties we introduce (such as strong partitioned synchrony) that must be valid in Spa and a trivially implementable timeliness oracle, we show how to implement a perfect failure detector P in Spa. Moreover, we show that even if strong partitioned synchrony is not valid, we are still able to take advantage of the existing synchronous partitions for improving the robustness of applications, by introducing a partially perfect (and accurate) failure detector named xP. We also discuss how applications can benefit from these failure detectors and present some related experimental data. The necessary properties and algorithms for implementing P and xP are presented in the paper, as well as the related correctness proofs.