Tolerating latency in replicated state machines through client speculation

Authors:
Benjamin Wester;James Cowling;Edmund B. Nightingale;Peter M. Chen;Jason Flinn;Barbara Liskov
Affiliations:
University of Michigan;MIT CSAIL;Microsoft Research;University of Michigan;University of Michigan;MIT CSAIL
Venue:
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Year:
2009

Citing 35
Cited 12

Time warp operating system

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Consensus in the presence of partial synchrony

Journal of the ACM (JACM)
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems

PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Asynchronous consensus and broadcast protocols

Journal of the ACM (JACM)
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic I/O hint generation through speculative execution

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
The Rampart Toolkit for Building High-Integrity Services

Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
The SecureRing Protocols for Securing Group Communication

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
Resilient consensus protocols

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Processing Transactions over Optimistic Atomic Broadcast Protocols

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Separating agreement from execution for byzantine fault tolerant services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
NIST Net: a Linux-based network emulation tool

ACM SIGCOMM Computer Communication Review
High Throughput Byzantine Fault Tolerance

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Byzantine quorum systems

Distributed Computing
Fault-scalable Byzantine fault-tolerant services

Proceedings of the twentieth ACM symposium on Operating systems principles
Speculative execution in a distributed file system

Proceedings of the twentieth ACM symposium on Operating systems principles
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
Pulse: a dynamic deadlock detection mechanism using speculative execution

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Proactive recovery in a Byzantine-fault-tolerant system

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Zyzzyva: speculative byzantine fault tolerance

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Low-overhead byzantine fault-tolerant storage

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
AutoBash: improving configuration management with operating system causality analysis

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
HQ replication: a hybrid quorum protocol for byzantine fault tolerance

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
The N-Version Approach to Fault-Tolerant Software

IEEE Transactions on Software Engineering
Parallelizing security checks on commodity hardware

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Remus: high availability via asynchronous virtual machine replication

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation

Zyzzyva: Speculative Byzantine fault tolerance

ACM Transactions on Computer Systems (TOCS)
Prophecy: using history for high-throughput fault tolerance

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
SPORC: group collaboration using untrusted cloud resources

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency

Proceedings of the sixth conference on Computer systems
Operating system support for application-specific speculation

Proceedings of the sixth conference on Computer systems
PipeCloud: using causality to overcome speed-of-light delays in cloud-based disaster recovery

Proceedings of the 2nd ACM Symposium on Cloud Computing
Improving server applications with system transactions

Proceedings of the 7th ACM european conference on Computer Systems
From viewstamped replication to byzantine fault tolerance

Replication
Probabilistically bounded staleness for practical partial quorums

Proceedings of the VLDB Endowment
Adaptive request batching for byzantine replication

ACM SIGOPS Operating Systems Review
Iwazaru: the byzantine sequencer

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Reliable speculative processing of out-of-order event streams in generic publish/subscribe middlewares

Proceedings of the 7th ACM international conference on Distributed event-based systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Replicated state machines are an important and widely-studied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state machine protocols tend to magnify the effects of high network latencies caused by geographic distribution. In this paper, we examine how to use speculative execution at the clients of a replicated service to reduce the impact of network and protocol latency. We first give design principles for using client speculation with replicated services, such as generating early replies and prioritizing throughput over latency. We then describe a mechanism that allows speculative clients to make new requests through replica-resolved speculation and predicated writes. We implement a detailed case study that applies this approach to a standard Byzantine fault tolerant protocol (PBFT) for replicated NFS and counter services. Client speculation trades in 18% maximum throughput to decrease the effective latency under light workloads, letting us speed up run time on single-client micro-benchmarks 1.08-19× when the client is co-located with the primary. On a macro-benchmark, reduced latency gives the client a speedup of up to 5×.