Bobtail: avoiding long tails in the cloud

  • Authors:
  • Yunjing Xu;Zachary Musgrave;Brian Noble;Michael Bailey

  • Affiliations:
  • University of Michigan;University of Michigan;University of Michigan;University of Michigan

  • Venue:
  • nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
  • Year:
  • 2013

Quantified Score

Hi-index 0.02

Visualization

Abstract

Highly modular data center applications such as Bing, Facebook, and Amazon's retail platform are known to be susceptible to long tails in response times. Services such as Amazon's EC2 have proven attractive platforms for building similar applications. Unfortunately, virtualization used in such platforms exacerbates the long tail problem by factors of two to four. Surprisingly, we find that poor response times in EC2 are a property of nodes rather than the network, and that this property of nodes is both pervasive throughout EC2 and persistent over time. The root cause of this problem is co-scheduling of CPU-bound and latency-sensitive tasks. We leverage these observations in Bobtail, a system that proactively detects and avoids these bad neighboring VMs without significantly penalizing node instantiation. With Bobtail, common communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.