Zyzzyva: Speculative Byzantine fault tolerance

Authors:
Ramakrishna Kotla;Lorenzo Alvisi;Mike Dahlin;Allen Clement;Edmund Wong
Affiliations:
Microsoft Research, Silicon Valley, Mountain View, CA;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2010

Citing 37
Cited 5

Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.

ACM Transactions on Programming Languages and Systems (TOPLAS)
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Deciding when to forget in the Elephant file system

Proceedings of the seventeenth ACM symposium on Operating systems principles
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
The Rampart Toolkit for Building High-Integrity Services

Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
High Throughput Byzantine Fault Tolerance

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
BAR fault tolerance for cooperative services

Proceedings of the twentieth ACM symposium on Operating systems principles
Fault-scalable Byzantine fault-tolerant services

Proceedings of the twentieth ACM symposium on Operating systems principles
Speculative execution in a distributed file system

Proceedings of the twentieth ACM symposium on Operating systems principles
IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
Fast Byzantine Consensus

IEEE Transactions on Dependable and Secure Computing
Proactive recovery in a Byzantine-fault-tolerant system

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Zyzzyva: speculative byzantine fault tolerance

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Attested append-only memory: making adversaries stick to their word

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Rethink the sync

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
EXPLODE: a lightweight, general system for finding serious storage system errors

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
HQ replication: a hybrid quorum protocol for byzantine fault tolerance

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
SafeStore: a durable and practical storage system

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
BFT protocols under fire

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Making Byzantine fault tolerant systems tolerate Byzantine faults

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Zeno: eventually consistent Byzantine-fault tolerance

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Tolerating latency in replicated state machines through client speculation

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Xbft: byzantine fault tolerance with high performance, low cost, and aggressive fault isolation

Xbft: byzantine fault tolerance with high performance, low cost, and aggressive fault isolation
Upright cluster services

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A new paradigm for collision-free hashing: incrementality at reduced cost

EUROCRYPT'97 Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques
Lower bounds for asynchronous consensus

Future directions in distributed computing
Beyond one-third faulty replicas in byzantine fault tolerant systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

Secure network provenance

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
CheapBFT: resource-efficient byzantine fault tolerance

Proceedings of the 7th ACM european conference on Computer Systems
Byzantine fault-tolerance with commutative commands

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Augustus: scalable and robust storage for cloud applications

Proceedings of the 8th ACM European Conference on Computer Systems
On the efficiency of durable state machine replication

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference

Quantified Score

Hi-index	0.01

Visualization

Abstract

A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine Fault-Tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary (“Byzantine”) ways. Despite this promise, practitioners hesitate to deploy BFT systems, at least partly because of the perception that BFT must impose high overheads. In this article, we present Zyzzyva, a protocol that uses speculation to reduce the cost of BFT replication. In Zyzzyva, replicas reply to a client's request without first running an expensive three-phase commit protocol to agree on the order to process requests. Instead, they optimistically adopt the order proposed by a primary server, process the request, and reply immediately to the client. If the primary is faulty, replicas can become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima and to achieve throughputs of tens of thousands of requests per second, making BFT replication practical for a broad range of demanding services.