All about Eve: execute-verify replication for multi-core servers

Authors:
Manos Kapritsos;Yang Wang;Vivien Quema;Allen Clement;Lorenzo Alvisi;Mike Dahlin
Affiliations:
The University of Texas at Austin;The University of Texas at Austin;Grenoble INP;MPI-SWS;The University of Texas at Austin;The University of Texas at Austin
Venue:
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Year:
2012

Citing 39
Cited 4

Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Fault-tolerance in Delta-4

ACM SIGOPS Operating Systems Review
Horus: a flexible group communication system

Communications of the ACM
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
RecPlay: a fully integrated practical record/replay system

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
A "flight data recorder" for enabling full-system multiprocessor deterministic replay

Proceedings of the 30th annual international symposium on Computer architecture
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
High Throughput Byzantine Fault Tolerance

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
BrowserShield: vulnerability-driven filtering of dynamic HTML

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
HQ replication: a hybrid quorum protocol for byzantine fault tolerance

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Zyzzyva: speculative byzantine fault tolerance

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Execution replay of multiprocessor virtual machines

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Remus: high availability via asynchronous virtual machine replication

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
DMP: deterministic shared memory multiprocessing

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Making Byzantine fault tolerant systems tolerate Byzantine faults

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Zeno: eventually consistent Byzantine-fault tolerance

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
PRES: probabilistic replay with execution sketching on multiprocessors

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
ODR: output-deterministic replay for multicore debugging

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Upright cluster services

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
CoreDet: a compiler and runtime system for deterministic multithreaded execution

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Mencius: building efficient replicated state machines for WANs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
The case for determinism in database systems

Proceedings of the VLDB Endowment
Deterministic process groups in dOS

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
DoublePlay: parallelizing sequential logging and replay

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
RCDC: a relaxed consistency deterministic computer

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
ZZ and the art of practical BFT execution

Proceedings of the sixth conference on Computer systems
Calvin: Deterministic or not? Free will to choose

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Detecting failures in distributed systems with the Falcon spy network

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Dthreads: efficient deterministic multithreading

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Efficient deterministic multithreading through schedule relaxation

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Improving server applications with system transactions

Proceedings of the 7th ACM european conference on Computer Systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
There is more consensus in Egalitarian parliaments

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
COLO: COarse-grained LOck-stepping virtual machines for non-stop service

Proceedings of the 4th annual Symposium on Cloud Computing
Towards transparent hardening of distributed systems

Proceedings of the 9th Workshop on Hot Topics in Dependable Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents Eve, a new Execute-Verify architecture that allows state machine replication to scale to multi-core servers. Eve departs from the traditional agree-execute architecture of state machine replication: replicas first execute groups of requests concurrently and then verify that they can reach agreement on a state and output produced by a correct replica; if they can not, they roll back and execute the requests sequentially. Eve minimizes divergence using application-specific criteria to organize requests into groups of requests that are unlikely to interfere. Our evaluation suggests that Eve's unique ability to combine execution independence with nondetermistic interleaving of requests enables high-performance replication for multi-core servers while tolerating a wide range of faults, including elusive concurrency bugs.