Less is more: trading a little bandwidth for ultra-low latency in the data center

  • Authors:
  • Mohammad Alizadeh;Abdul Kabbani;Tom Edsall;Balaji Prabhakar;Amin Vahdat;Masato Yasuda

  • Affiliations:
  • Stanford University;Google;Cisco Systems;Stanford University;Google and U.C. San Diego;NEC Corporation, Japan

  • Venue:
  • NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional measures of network goodness--goodput, quality of service, fairness--are expressed in terms of bandwidth. Network latency has rarely been a primary concern because delivering the highest level of bandwidth essentially entails driving up latency--at the mean and, especially, at the tail. Recently, however, there has been renewed interest in latency as a primary metric for mainstream applications. In this paper, we present the HULL (High-bandwidth Ultra-Low Latency) architecture to balance two seemingly contradictory goals: near baseline fabric latency and high bandwidth utilization. HULL leaves 'bandwidth headroom' using Phantom Queues that deliver congestion signals before network links are fully utilized and queues form at switches. By capping utilization at less than link capacity, we leave room for latency sensitive traffic to avoid buffering and the associated large delays. At the same time, we use DCTCP, a recently proposed congestion control algorithm, to adaptively respond to congestion and to mitigate the bandwidth penalties which arise from operating in a bufferless fashion. HULL further employs packet pacing to counter burstiness caused by Interrupt Coalescing and Large Send Offloading. Our implementation and simulation results show that by sacrificing a small amount (e.g., 10%) of bandwidth, HULL can dramatically reduce average and tail latencies in the data center.