Bobtail: avoiding long tails in the cloud

Authors:
Yunjing Xu;Zachary Musgrave;Brian Noble;Michael Bailey
Affiliations:
University of Michigan;University of Michigan;University of Michigan;University of Michigan
Venue:
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Year:
2013

Citing 24
Cited 6

Quality is in the eye of the beholder: meeting users' requirements for Internet quality of service

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Latency lags bandwith

Communications of the ACM - Voting systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Xen and co.: communication-aware CPU scheduling for consolidated xen-based hosting platforms

Proceedings of the 3rd international conference on Virtual execution environments
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Scheduling I/O in virtual machine monitors

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Task-aware virtual machine scheduling for I/O performance.

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Safe and effective fine-grained TCP retransmissions for datacenter communication

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Empirical evaluation of latency-sensitive application performance in the cloud

MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems
The impact of virtualization on network performance of amazon EC2 data center

INFOCOM'10 Proceedings of the 29th conference on Information communications
Performance Measurements and Analysis of Network I/O Applications in Virtualized Cloud

CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Data center TCP (DCTCP)

Proceedings of the ACM SIGCOMM 2010 conference
CloudCmp: comparing public cloud providers

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Runtime measurements in the cloud: observing, analyzing, and reducing variance

Proceedings of the VLDB Endowment
Explaining packet delays under virtualization

ACM SIGCOMM Computer Communication Review
Black-box and gray-box strategies for virtual machine migration

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
It's time for low latency

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Better never than late: meeting deadlines in datacenter networks

Proceedings of the ACM SIGCOMM 2011 conference
Less is more: trading a little bandwidth for ultra-low latency in the data center

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Deadline-aware datacenter tcp (D2TCP)

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Finishing flows quickly with preemptive scheduling

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
DeTail: reducing the flow completion time tail in datacenter networks

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Exploiting hardware heterogeneity within the same instance type of Amazon EC2

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing

Speeding up distributed request-response workflows

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
CrowdMeter: an emulation platform for performance evaluation of crowd-sensing applications

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
Small is better: avoiding latency traps in virtualized data centers

Proceedings of the 4th annual Symposium on Cloud Computing
DeepDive: transparently identifying and managing performance interference in virtualized environments

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Toward software-defined SLAs

Communications of the ACM
Toward Software-defined SLAs

Queue - Distributed Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Highly modular data center applications such as Bing, Facebook, and Amazon's retail platform are known to be susceptible to long tails in response times. Services such as Amazon's EC2 have proven attractive platforms for building similar applications. Unfortunately, virtualization used in such platforms exacerbates the long tail problem by factors of two to four. Surprisingly, we find that poor response times in EC2 are a property of nodes rather than the network, and that this property of nodes is both pervasive throughout EC2 and persistent over time. The root cause of this problem is co-scheduling of CPU-bound and latency-sensitive tasks. We leverage these observations in Bobtail, a system that proactively detects and avoids these bad neighboring VMs without significantly penalizing node instantiation. With Bobtail, common communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.