ZZ and the art of practical BFT execution

Authors:
Timothy Wood;Rahul Singh;Arun Venkataramani;Prashant Shenoy;Emmanuel Cecchet
Affiliations:
University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA
Venue:
Proceedings of the sixth conference on Computer systems
Year:
2011

Citing 20
Cited 5

The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
The Rampart Toolkit for Building High-Integrity Services

Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
The SecureRing Protocols for Securing Group Communication

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
Terra: a virtual machine-based platform for trusted computing

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Cheap Paxos

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Fault-scalable Byzantine fault-tolerant services

Proceedings of the twentieth ACM symposium on Operating systems principles
HQ replication: a hybrid quorum protocol for byzantine fault tolerance

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Zyzzyva: speculative byzantine fault tolerance

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Resilient Intrusion Tolerance through Proactive and Reactive Recovery

PRDC '07 Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing
Remus: high availability via asynchronous virtual machine replication

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
BFT protocols under fire

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Making Byzantine fault tolerant systems tolerate Byzantine faults

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
The next 700 BFT protocols

Proceedings of the 5th European conference on Computer systems
Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency

Proceedings of the sixth conference on Computer systems

Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency

Proceedings of the sixth conference on Computer systems
CheapBFT: resource-efficient byzantine fault tolerance

Proceedings of the 7th ACM european conference on Computer Systems
Gnothi: separating data and metadata for efficient and available storage replication

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
All about Eve: execute-verify replication for multi-core servers

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Iwazaru: the byzantine sequencer

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adoption in commercial distributed applications. We present ZZ, a new approach that reduces the replication cost of BFT services from 2f+1 to practically f+1. The key insight in ZZ is to use f+1 execution replicas in the normal case and to activate additional replicas only upon failures. In data centers where multiple applications share a physical server, ZZ reduces the aggregate number of execution replicas running in the data center, improving throughput and response times. ZZ relies on virtualization---a technology already employed in modern data centers---for fast replica activation upon failures, and enables newly activated replicas to immediately begin processing requests by fetching state on-demand. A prototype implementation of ZZ using the BASE library and Xen shows that, when compared to a system with 2f+1 replicas, our approach yields lower response times and up to 33% higher throughput in a prototype data center with four BFT web applications. We also show that ZZ can handle simultaneous failures and achieve sub-second recovery.