Chronos: predictable low latency for data center applications

Authors:
Rishi Kapoor;George Porter;Malveeka Tewari;Geoffrey M. Voelker;Amin Vahdat
Affiliations:
University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego and Google Inc.
Venue:
Proceedings of the Third ACM Symposium on Cloud Computing
Year:
2012

Citing 26
Cited 10

Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Cluster-based scalable network services

Proceedings of the sixteenth ACM symposium on Operating systems principles
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
An implementation and analysis of the virtual interface architecture

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
The origins of network server latency & the myth of connection scheduling

Proceedings of the joint international conference on Measurement and modeling of computer systems
Probability, Statistics, and Queueing Theory with Computer Science Applications

Probability, Statistics, and Queueing Theory with Computer Science Applications
Measuring the capacity of a web server

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
OpenFlow: enabling innovation in campus networks

ACM SIGCOMM Computer Communication Review
A scalable, commodity data center network architecture

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Real-World Concurrency

Queue - The Concurrency Problem
HyperX: topology, routing, and packaging of efficient large-scale networks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
Low overhead concurrency control for partitioned main memory databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data center TCP (DCTCP)

Proceedings of the ACM SIGCOMM 2010 conference
Providing a cloud network infrastructure on a supercomputer

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
It's time for low latency

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Resizable, scalable, concurrent hash tables via relativistic programming

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Memcached Design on High Performance RDMA Capable Interconnects

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Many-core key-value store

IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
Less is more: trading a little bandwidth for ultra-low latency in the data center

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
DeTail: reducing the flow completion time tail in datacenter networks

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication

SoNIC: precise realtime software access and control of wired networks

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Thin servers with smart pipes: designing SoC accelerators for memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture
Speeding up distributed request-response workflows

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Small is better: avoiding latency traps in virtualized data centers

Proceedings of the 4th annual Symposium on Cloud Computing
vTurbo: accelerating virtual machine I/O processing using designated turbo-sliced core

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Network interface design for low latency request-response protocols

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
MICA: a holistic approach to fast in-memory key-value storage

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
SENIC: scalable NIC for end-host rate limiting

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
mTCP: a highly scalable user-level TCP stack for multicore systems

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data center applications, predictability in service time and controlled latency, especially tail latency, are essential for building performant applications. This is especially true for applications or services built by accessing data across thousands of servers to generate a user response. Current practice has been to run such services at low utilization to rein in latency outliers, which decreases efficiency and limits the number of service invocations developers can issue while still meeting tight latency budgets. In this paper, we analyze three data center applications, Memcached, OpenFlow, and Web search, to measure the effect of 1) kernel socket handling, NIC interaction, and the network stack, 2) application locks contested in the kernel, and 3) application-layer queueing due to requests being stalled behind straggler threads on tail latency. We propose Chronos, a framework to deliver predictable, low latency in data center applications. Chronos uses a combination of existing and new techniques to achieve this end, for example by supporting Memcached at 200,000 requests per second per server at mean latency of 10 μs with a 99th percentile latency of only 30 μs, a factor of 20 lower than baseline Memcached.