Zeno: eventually consistent Byzantine-fault tolerance

Authors:
Atul Singh;Pedro Fonseca;Petr Kuznetsov;Rodrigo Rodrigues;Petros Maniatis
Affiliations:
MPI-SWS and Rice University;MPI-SWS;TU Berlin, Deutsche Telekom Laboratories;MPI-SWS;Intel Research Berkeley
Venue:
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Year:
2009

Citing 25
Cited 18

Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Managing update conflicts in Bayou, a weakly connected replicated storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Byzantine quorum systems

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Eventually-serializable data services

Theoretical Computer Science
Dealing with server corruption in weakly consistent replicated data systems

Wireless Networks
Towards robust distributed systems (abstract)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Resilient overlay networks

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Design and evaluation of a conit-based continuous consistency model for replicated services

ACM Transactions on Computer Systems (TOCS)
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
End-to-end WAN service availability

IEEE/ACM Transactions on Networking (TON)
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Optimistic replication

ACM Computing Surveys (CSUR)
Scalability and accuracy in a large-scale network emulator

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Fault-scalable Byzantine fault-tolerant services

Proceedings of the twentieth ACM symposium on Operating systems principles
Secure untrusted data repository (SUNDR)

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
HQ replication: a hybrid quorum protocol for byzantine fault tolerance

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Zyzzyva: speculative byzantine fault tolerance

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Attested append-only memory: making adversaries stick to their word

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Beyond one-third faulty replicas in byzantine fault tolerant systems

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

Defining weakly consistent Byzantine fault-tolerant services

LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Toward a cloud computing research agenda

ACM SIGACT News
Zyzzyva: Speculative Byzantine fault tolerance

ACM Transactions on Computer Systems (TOCS)
Lithium: virtual machine storage for the cloud

Proceedings of the 1st ACM symposium on Cloud computing
Eventually linearizable shared objects

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Prophecy: using history for high-throughput fault tolerance

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Weak consistency as a last resort

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Depot: cloud storage with minimal trust

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
SPORC: group collaboration using untrusted cloud resources

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Transactional storage for geo-replicated systems

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Depot: Cloud Storage with Minimal Trust

ACM Transactions on Computer Systems (TOCS)
Fail-Aware Untrusted Storage

SIAM Journal on Computing
Don't lose sleep over availability: the GreenUp decentralized wakeup service

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Toward fast eventual consistency with performance guarantees

Proceedings of the 9th international conference on Autonomic computing
All about Eve: execute-verify replication for multi-core servers

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Making geo-replicated systems fast as possible, consistent when necessary

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Augustus: scalable and robust storage for cloud applications

Proceedings of the 8th ACM European Conference on Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many distributed services are hosted at large, shared, geographically diverse data centers, and they use replication to achieve high availability despite the unreachability of an entire data center. Recent events show that non-crash faults occur in these services and may lead to long outages. While Byzantine-Fault Tolerance (BFT) could be used to withstand these faults, current BFT protocols can become unavailable if a small fraction of their replicas are unreachable. This is because existing BFT protocols favor strong safety guarantees (consistency) over liveness (availability). This paper presents a novel BFT state machine replication protocol called Zeno that trades consistency for higher availability. In particular, Zeno replaces strong consistency (linearizability) with a weaker guarantee (eventual consistency): clients can temporarily miss each other's updates but when the network is stable the states from the individual partitions are merged by having the replicas agree on a total order for all requests. We have built a prototype of Zeno and our evaluation using micro-benchmarks shows that Zeno provides better availability than traditional BFT protocols.