On the reliability of consensus-based fault-tolerant distributed computing systems

Authors:
Özalp Babaoğlu
Affiliations:
Cornell Univ., Ithaca, NY
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1987

Citing 15
Cited 10

On the optimum checkpoint selection problem

SIAM Journal on Computing
Synchronizing clocks in the presence of faults

Journal of the ACM (JACM)
Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.

ACM Transactions on Programming Languages and Systems (TOPLAS)
An O(lg n) expected rounds randomized Byzantine generals protocol

STOC '85 Proceedings of the seventeenth annual ACM symposium on Theory of computing
Stopping times of distributed consensus protocols: A probabilistic analysis

Information Processing Letters
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
Fast asynchronous Byzantine agreement (extended abstract)

Proceedings of the fourth annual ACM symposium on Principles of distributed computing
Inexact agreement: accuracy, precision, and graceful degradation

Proceedings of the fourth annual ACM symposium on Principles of distributed computing
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Byzantine generals in action: implementing fail-stop processors

ACM Transactions on Computer Systems (TOCS)
The space shuttle primary computer system

Communications of the ACM
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Arpanet Routing

Proceedings of the Asilomar Workshop on Fault-Tolerant Distributed Computing
Paradigms for Distributed Programs

Distributed Systems: Methods and Tools for Specification, An Advanced Course, April 3-12, 1984 and April 16-25, 1985 Munich
Issues of fault tolerance in concurrent computations (databases, reliability, transactions, agreement protocols, distributed computing)

Issues of fault tolerance in concurrent computations (databases, reliability, transactions, agreement protocols, distributed computing)

The cost of messages

PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
The consensus problem in fault-tolerant computing

ACM Computing Surveys (CSUR)
Byzantine Agreement in the Presence of Mixed Faults on Processors and Links

IEEE Transactions on Parallel and Distributed Systems
A new model for availability in the face of self-propagating attacks

Proceedings of the 1998 workshop on New security paradigms
Fault-tolerance support in distributed systems

EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Gossip-Style Failure Detection and Distributed Consensus for Scalable Heterogeneous Clusters

Cluster Computing
Optimal Agreement Protocol in Malicious Faulty Processors and Faulty Links

IEEE Transactions on Knowledge and Data Engineering
Gossip versus Deterministically Constrained Flooding on Small Networks

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Reliability and availability analysis of self-stabilizing systems

SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
Challenges in evaluating distributed algorithms

Future directions in distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The designer of a fault-tolerant distributed system faces numerous alternatives. Using a stochastic model of processor failure times, we investigate design choices such as replication level, protocol running time, randomized versus deterministic protocols, fault detection, and authentication. We use the probability with which a system produces the correct output as our evaluation criterion. This contrasts with previous fault-tolerance results that guarantee correctness only if the percentage of faulty processors in the system can be bounded. Our results reveal some subtle and counterintuitive interactions between the design parameters and system reliability.