SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Asynchronous consensus and broadcast protocols
Journal of the ACM (JACM)
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic I/O hint generation through speculative execution
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Practical Byzantine fault tolerance
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
The Rampart Toolkit for Building High-Integrity Services
Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
The SecureRing Protocols for Securing Group Communication
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 3
PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols
PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Processing Transactions over Optimistic Atomic Broadcast Protocols
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Separating agreement from execution for byzantine fault tolerant services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
NIST Net: a Linux-based network emulation tool
ACM SIGCOMM Computer Communication Review
High Throughput Byzantine Fault Tolerance
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Distributed Computing
Fault-scalable Byzantine fault-tolerant services
Proceedings of the twentieth ACM symposium on Operating systems principles
Speculative execution in a distributed file system
Proceedings of the twentieth ACM symposium on Operating systems principles
Rx: treating bugs as allergies---a safe method to survive software failures
Proceedings of the twentieth ACM symposium on Operating systems principles
Pulse: a dynamic deadlock detection mechanism using speculative execution
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Proactive recovery in a Byzantine-fault-tolerant system
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Zyzzyva: speculative byzantine fault tolerance
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Low-overhead byzantine fault-tolerant storage
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
AutoBash: improving configuration management with operating system causality analysis
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
HQ replication: a hybrid quorum protocol for byzantine fault tolerance
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
The N-Version Approach to Fault-Tolerant Software
IEEE Transactions on Software Engineering
Parallelizing security checks on commodity hardware
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Zyzzyva: Speculative Byzantine fault tolerance
ACM Transactions on Computer Systems (TOCS)
Prophecy: using history for high-throughput fault tolerance
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
SPORC: group collaboration using untrusted cloud resources
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency
Proceedings of the sixth conference on Computer systems
Operating system support for application-specific speculation
Proceedings of the sixth conference on Computer systems
PipeCloud: using causality to overcome speed-of-light delays in cloud-based disaster recovery
Proceedings of the 2nd ACM Symposium on Cloud Computing
Improving server applications with system transactions
Proceedings of the 7th ACM european conference on Computer Systems
Probabilistically bounded staleness for practical partial quorums
Proceedings of the VLDB Endowment
Adaptive request batching for byzantine replication
ACM SIGOPS Operating Systems Review
Iwazaru: the byzantine sequencer
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
Replicated state machines are an important and widely-studied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state machine protocols tend to magnify the effects of high network latencies caused by geographic distribution. In this paper, we examine how to use speculative execution at the clients of a replicated service to reduce the impact of network and protocol latency. We first give design principles for using client speculation with replicated services, such as generating early replies and prioritizing throughput over latency. We then describe a mechanism that allows speculative clients to make new requests through replica-resolved speculation and predicated writes. We implement a detailed case study that applies this approach to a standard Byzantine fault tolerant protocol (PBFT) for replicated NFS and counter services. Client speculation trades in 18% maximum throughput to decrease the effective latency under light workloads, letting us speed up run time on single-client micro-benchmarks 1.08-19× when the client is co-located with the primary. On a macro-benchmark, reduced latency gives the client a speedup of up to 5×.