Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
ACM SIGOPS Operating Systems Review
Horus: a flexible group communication system
Communications of the ACM
ACM Transactions on Computer Systems (TOCS)
RecPlay: a fully integrated practical record/replay system
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Practical byzantine fault tolerance and proactive recovery
ACM Transactions on Computer Systems (TOCS)
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
Separating agreement from execution for byzantine fault tolerant services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
High Throughput Byzantine Fault Tolerance
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
BrowserShield: vulnerability-driven filtering of dynamic HTML
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
HQ replication: a hybrid quorum protocol for byzantine fault tolerance
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Zyzzyva: speculative byzantine fault tolerance
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
DMP: deterministic shared memory multiprocessing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Making Byzantine fault tolerant systems tolerate Byzantine faults
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Zeno: eventually consistent Byzantine-fault tolerance
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
PRES: probabilistic replay with execution sketching on multiprocessors
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
CoreDet: a compiler and runtime system for deterministic multithreaded execution
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Mencius: building efficient replicated state machines for WANs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
The case for determinism in database systems
Proceedings of the VLDB Endowment
Deterministic process groups in dOS
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
DoublePlay: parallelizing sequential logging and replay
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
RCDC: a relaxed consistency deterministic computer
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
ZZ and the art of practical BFT execution
Proceedings of the sixth conference on Computer systems
Calvin: Deterministic or not? Free will to choose
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Detecting failures in distributed systems with the Falcon spy network
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Dthreads: efficient deterministic multithreading
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Efficient deterministic multithreading through schedule relaxation
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Improving server applications with system transactions
Proceedings of the 7th ACM european conference on Computer Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
There is more consensus in Egalitarian parliaments
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
COLO: COarse-grained LOck-stepping virtual machines for non-stop service
Proceedings of the 4th annual Symposium on Cloud Computing
Towards transparent hardening of distributed systems
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Hi-index | 0.00 |
This paper presents Eve, a new Execute-Verify architecture that allows state machine replication to scale to multi-core servers. Eve departs from the traditional agree-execute architecture of state machine replication: replicas first execute groups of requests concurrently and then verify that they can reach agreement on a state and output produced by a correct replica; if they can not, they roll back and execute the requests sequentially. Eve minimizes divergence using application-specific criteria to organize requests into groups of requests that are unlikely to interfere. Our evaluation suggests that Eve's unique ability to combine execution independence with nondetermistic interleaving of requests enables high-performance replication for multi-core servers while tolerating a wide range of faults, including elusive concurrency bugs.