Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
CHI '01 Extended Abstracts on Human Factors in Computing Systems
Improving web availability for clients with MONET
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Green: a framework for supporting energy-conscious programming using controlled approximation
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the ACM SIGCOMM 2010 conference
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
ICTCP: Incast Congestion Control for TCP in data center networks
Proceedings of the 6th International COnference
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Dynamic knobs for responsive power-aware computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Better never than late: meeting deadlines in datacenter networks
Proceedings of the ACM SIGCOMM 2011 conference
Jockey: guaranteed job latency in data parallel clusters
Proceedings of the 7th ACM european conference on Computer Systems
RPT: re-architecting loss protection for content-aware networks
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Finishing flows quickly with preemptive scheduling
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
DeTail: reducing the flow completion time tail in datacenter networks
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
AppInsight: mobile app performance monitoring in the wild
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
More is less: reducing latency via redundancy
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Deconstructing datacenter packet transport
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Chronos: predictable low latency for data center applications
Proceedings of the Third ACM Symposium on Cloud Computing
Zeta: scheduling interactive services with partial execution
Proceedings of the Third ACM Symposium on Cloud Computing
Communications of the ACM
Bobtail: avoiding long tails in the cloud
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Demystifying page load performance with WProf
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.