Distributed agreement in the presence of processor and communication faults
IEEE Transactions on Software Engineering
A communication-efficient canonical form for fault-tolerant distributed protocols
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Asynchronous byzantine agreement protocols
Information and Computation
Achieving consensus in fault-tolerant distributed computer systems: protocols, lower bounds, and simulations
A Compiler that Increases the Fault Tolerance of Asynchronous Protocols
IEEE Transactions on Computers
Knowledge and common knowledge in a distributed environment
Journal of the ACM (JACM)
Automatically increasing the fault-tolerance of distributed algorithms
Journal of Algorithms
Consensus in the presence of timing uncertainty: omission and Byzantine failures (extended abstract)
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
The possibility and the complexity of achieving fault-tolerant coordination
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Bounds on the time to reach agreement in the presence of timing uncertainty
Journal of the ACM (JACM)
Automatically increasing fault tolerance in distributed systems
Automatically increasing fault tolerance in distributed systems
Fully Polynomial Byzantine Agreement for Processors in Rounds
SIAM Journal on Computing
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Issues of fault tolerance in concurrent computations (databases, reliability, transactions, agreement protocols, distributed computing)
Common knowledge and consistent simultaneous coordination
Distributed Computing
Hundreds of impossibility results for distributed computing
Distributed Computing - Papers in celebration of the 20th anniversary of PODC
The perfectly synchronized round-based model of distributed computing
Information and Computation
Adaptive timeliness of consensus in presence of crash and timing faults
Journal of Parallel and Distributed Computing
PeerReview: practical accountability for distributed systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Nysiad: practical protocol transformation to tolerate Byzantine failures
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Narrowing power vs efficiency in synchronous set agreement: Relationship, algorithms and lower bound
Theoretical Computer Science
Making distributed applications robust
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Byzantine renaming in synchronous systems with t
Proceedings of the 2013 ACM symposium on Principles of distributed computing
Hi-index | 0.01 |
The difficulty of designing fault-tolerant distributed algorithms incr eases with the severity of failures that an algorithm must tolerate, especially for systems with synchronous message passing. This paper considers methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe failures. These translations simplify the design task by allowing algorithm designers to assume that processors fail only by stopping. Such translations can be quantified by two measures: fault-tolerance, which is a measure of how many processors must remain correct for the translation to be correct, and round-complexity, which is a measure of how the translation increases the running time of an algorithm. Understanding these translations and their limitations with respect to these measures can provide insight into the relative impact of different models of faculty behavior on the ability to provide fault-tolerant applications for systems with synchronous message passing.This paper considers translations fr om crash failures to each of the following types of more severe failures: omission to send messages; omission to send and receive messages; and totally arbitrary behavior. It shows that previously developed translaions to send-omission failures are optimal with respect to both fault-tolerance and round-complexity. It exhibits a hierarchy of translations to general (send/receive) omission failures that improves upon the fault-tolerance of previously developed translations. These translations are optimal in that they cannot be improved with respect to one measure without negatively affecting the other; that is, the hierarchy of translations is matched by corresponding hierarchy of impossibility results. The paper also gives a hierarchy of translations to arbitrary failures that improves upon the round-complexity of previously developed translations. These translations are near-optimal;