Automatically increasing the fault-tolerance of distributed algorithms
Journal of Algorithms
Fast asynchronous Byzantine agreement with optimal resilience
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Muteness detectors for consensus with Byzantine processes
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Fault-tolerant broadcasts and related problems
Distributed systems (2nd Ed.)
Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors
IEEE Transactions on Parallel and Distributed Systems
A method for obtaining digital signatures and public-key cryptosystems
Communications of the ACM
From Crash Fault-Tolerance to Arbitrary-Fault Tolerance: Towards a Modular Approach
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Unreliable Intrusion Detection in Distributed Computations
CSFW '97 Proceedings of the 10th IEEE workshop on Computer Security Foundations
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols
PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
An asynchronous [(n - 1)/3]-resilient consensus protocol
PODC '84 Proceedings of the third annual ACM symposium on Principles of distributed computing
How to Tolerate Half Less One Byzantine Nodes in Practical Distributed Systems
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
A simple and fast asynchronous consensus protocol based on a weak failure detector
Distributed Computing
Early consensus in an asynchronous system with a weak failure detector
Distributed Computing
Solving Vector Consensus with a Wormhole
IEEE Transactions on Parallel and Distributed Systems
SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Early stopping in Global Data Computation
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Nowadays, there are many protocols able to cope with process crashes, but, unfortunately, a process crash represents only a particular faulty behavior. Handling tougher failures (e.g. sending omission failures, receive omission failures, arbitrary failures) is a real practical challenge due to malicious attacks or unexpected software errors. This is usually achieved either by changing, in an ad hoc manner, the code of a crash resilient protocol or by devising a new protocol from scratch. This paper proposes an alternative methodology to detect processes experiencing arbitrary failures. On this basis, it introduces the notions of liveness failure detector and safety failure detector as two independent software components. With this approach, the nature of failures experienced by processes becomes transparent to the protocol using the components. This methodology brings a few advantages: it makes possible to increase the resilience of a protocol designed in a crash failure context without changing its code by concentrating only on the design of a few well-specified components, and second, it clearly separates the task of designing the protocol from the task of detecting faulty processes, a methodological improvement. Finally, the feasibility of this approach is shown, by providing an implementation of liveness failure detectors and of safety failure detectors for two protocols: one solving the consensus, and the second solving the problem of global data computation.