A methodology to design arbitrary failure detectors for distributed protocols

Authors:
Roberto Baldoni;Jean-Michel Hélary;Sara Tucci Piergiovanni
Affiliations:
Dipartimento di Informatica e Sistemistica "Antonio Ruberti", Universitá di Roma La Sapienza, Via Ariosto 25, Roma, Italy;IRISA, Campus de Beaulieu, 35042 Rennes-Cedex, France;Dipartimento di Informatica e Sistemistica "Antonio Ruberti", Universitá di Roma La Sapienza, Via Ariosto 25, Roma, Italy
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2008

Citing 19
Cited 0

Automatically increasing the fault-tolerance of distributed algorithms

Journal of Algorithms
Fast asynchronous Byzantine agreement with optimal resilience

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
Muteness detectors for consensus with Byzantine processes

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Fault-tolerant broadcasts and related problems

Distributed systems (2nd Ed.)
Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors

IEEE Transactions on Parallel and Distributed Systems
A method for obtaining digital signatures and public-key cryptosystems

Communications of the ACM
From Crash Fault-Tolerance to Arbitrary-Fault Tolerance: Towards a Modular Approach

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Unreliable Intrusion Detection in Distributed Computations

CSFW '97 Proceedings of the 10th IEEE workshop on Computer Security Foundations
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
An asynchronous [(n - 1)/3]-resilient consensus protocol

PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
How to Tolerate Half Less One Byzantine Nodes in Practical Distributed Systems

SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
A simple and fast asynchronous consensus protocol based on a weak failure detector

Distributed Computing
Early consensus in an asynchronous system with a weak failure detector

Distributed Computing
A short introduction to failure detectors for asynchronous distributed systems

ACM SIGACT News
Solving Vector Consensus with a Wormhole

IEEE Transactions on Parallel and Distributed Systems
Randomized byzantine generals

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Early stopping in Global Data Computation

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, there are many protocols able to cope with process crashes, but, unfortunately, a process crash represents only a particular faulty behavior. Handling tougher failures (e.g. sending omission failures, receive omission failures, arbitrary failures) is a real practical challenge due to malicious attacks or unexpected software errors. This is usually achieved either by changing, in an ad hoc manner, the code of a crash resilient protocol or by devising a new protocol from scratch. This paper proposes an alternative methodology to detect processes experiencing arbitrary failures. On this basis, it introduces the notions of liveness failure detector and safety failure detector as two independent software components. With this approach, the nature of failures experienced by processes becomes transparent to the protocol using the components. This methodology brings a few advantages: it makes possible to increase the resilience of a protocol designed in a crash failure context without changing its code by concentrating only on the design of a few well-specified components, and second, it clearly separates the task of designing the protocol from the task of detecting faulty processes, a methodological improvement. Finally, the feasibility of this approach is shown, by providing an implementation of liveness failure detectors and of safety failure detectors for two protocols: one solving the consensus, and the second solving the problem of global data computation.