Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.
ACM Transactions on Programming Languages and Systems (TOPLAS)
Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Replication in the harp file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Practical Byzantine fault tolerance
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Deciding when to forget in the Elephant file system
Proceedings of the seventeenth ACM symposium on Operating systems principles
Reaching Agreement in the Presence of Faults
Journal of the ACM (JACM)
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Practical byzantine fault tolerance and proactive recovery
ACM Transactions on Computer Systems (TOCS)
The Rampart Toolkit for Building High-Integrity Services
Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
Separating agreement from execution for byzantine fault tolerant services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
High Throughput Byzantine Fault Tolerance
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
BAR fault tolerance for cooperative services
Proceedings of the twentieth ACM symposium on Operating systems principles
Fault-scalable Byzantine fault-tolerant services
Proceedings of the twentieth ACM symposium on Operating systems principles
Speculative execution in a distributed file system
Proceedings of the twentieth ACM symposium on Operating systems principles
Proceedings of the twentieth ACM symposium on Operating systems principles
IEEE Transactions on Dependable and Secure Computing
Proactive recovery in a Byzantine-fault-tolerant system
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Zyzzyva: speculative byzantine fault tolerance
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Attested append-only memory: making adversaries stick to their word
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
EXPLODE: a lightweight, general system for finding serious storage system errors
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
HQ replication: a hybrid quorum protocol for byzantine fault tolerance
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
SafeStore: a durable and practical storage system
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Making Byzantine fault tolerant systems tolerate Byzantine faults
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Zeno: eventually consistent Byzantine-fault tolerance
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Tolerating latency in replicated state machines through client speculation
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Xbft: byzantine fault tolerance with high performance, low cost, and aggressive fault isolation
Xbft: byzantine fault tolerance with high performance, low cost, and aggressive fault isolation
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A new paradigm for collision-free hashing: incrementality at reduced cost
EUROCRYPT'97 Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques
Lower bounds for asynchronous consensus
Future directions in distributed computing
Beyond one-third faulty replicas in byzantine fault tolerant systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
CheapBFT: resource-efficient byzantine fault tolerance
Proceedings of the 7th ACM european conference on Computer Systems
Byzantine fault-tolerance with commutative commands
OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Augustus: scalable and robust storage for cloud applications
Proceedings of the 8th ACM European Conference on Computer Systems
On the efficiency of durable state machine replication
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.01 |
A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine Fault-Tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary (“Byzantine”) ways. Despite this promise, practitioners hesitate to deploy BFT systems, at least partly because of the perception that BFT must impose high overheads. In this article, we present Zyzzyva, a protocol that uses speculation to reduce the cost of BFT replication. In Zyzzyva, replicas reply to a client's request without first running an expensive three-phase commit protocol to agree on the order to process requests. Instead, they optimistically adopt the order proposed by a primary server, process the request, and reply immediately to the client. If the primary is faulty, replicas can become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima and to achieve throughputs of tens of thousands of requests per second, making BFT replication practical for a broad range of demanding services.