Low latency via redundancy

Authors:
Ashish Vulimiri;Philip Brighten Godfrey;Radhika Mittal;Justine Sherry;Sylvia Ratnasamy;Scott Shenker
Affiliations:
UIUC, Urbana, IL, USA;UIUC, Urbana, IL, USA;UC Berkeley, Berkeley, CA, USA;UC Berkeley, Berkeley, CA, USA;UC Berkeley, Berkeley, CA, USA;UC Berkeley and ICSI, Berkeley, CA, USA
Venue:
Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
Year:
2013

Citing 19
Cited 0

Appendix: A primer on heavy-tailed distributions

Queueing Systems: Theory and Applications
Using redundancy to cope with failures in a delay tolerant network

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Bandwidth-efficient management of DHT routing tables

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Improving web availability for clients with MONET

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Percolation of Code to Enhance Parallel Dispatching and Execution

IEEE Transactions on Computers
VL2: a scalable and flexible data center network

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Data center TCP (DCTCP)

Proceedings of the ACM SIGCOMM 2010 conference
Hedera: dynamic flow scheduling for data center networks

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Measurement of loss pairs in network paths

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Network traffic characteristics of data centers in the wild

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Finding a needle in Haystack: facebook's photo storage

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
RPT: re-architecting loss protection for content-aware networks

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Why let resources idle? aggressive cloning of jobs with dolly

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
DARD: Distributed Adaptive Routing for Datacenter Networks

ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems
More is less: reducing latency via redundancy

Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Deconstructing datacenter packet transport

Proceedings of the 11th ACM Workshop on Hot Topics in Networks
The tail at scale

Communications of the ACM
Estimating queue length distributions for queues with random arrivals

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks.