Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Cluster-based scalable network services
Proceedings of the sixteenth ACM symposium on Operating systems principles
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
An implementation and analysis of the virtual interface architecture
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The origins of network server latency & the myth of connection scheduling
Proceedings of the joint international conference on Measurement and modeling of computer systems
Probability, Statistics, and Queueing Theory with Computer Science Applications
Probability, Statistics, and Queueing Theory with Computer Science Applications
Measuring the capacity of a web server
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
OpenFlow: enabling innovation in campus networks
ACM SIGCOMM Computer Communication Review
A scalable, commodity data center network architecture
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Queue - The Concurrency Problem
HyperX: topology, routing, and packaging of efficient large-scale networks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Low overhead concurrency control for partitioned main memory databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the ACM SIGCOMM 2010 conference
Providing a cloud network infrastructure on a supercomputer
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Resizable, scalable, concurrent hash tables via relativistic programming
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Memcached Design on High Performance RDMA Capable Interconnects
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
Less is more: trading a little bandwidth for ultra-low latency in the data center
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
DeTail: reducing the flow completion time tail in datacenter networks
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
SoNIC: precise realtime software access and control of wired networks
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Thin servers with smart pipes: designing SoC accelerators for memcached
Proceedings of the 40th Annual International Symposium on Computer Architecture
Speeding up distributed request-response workflows
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Small is better: avoiding latency traps in virtualized data centers
Proceedings of the 4th annual Symposium on Cloud Computing
vTurbo: accelerating virtual machine I/O processing using designated turbo-sliced core
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Network interface design for low latency request-response protocols
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Ubik: efficient cache sharing with strict qos for latency-critical workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
MICA: a holistic approach to fast in-memory key-value storage
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
SENIC: scalable NIC for end-host rate limiting
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
mTCP: a highly scalable user-level TCP stack for multicore systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
In data center applications, predictability in service time and controlled latency, especially tail latency, are essential for building performant applications. This is especially true for applications or services built by accessing data across thousands of servers to generate a user response. Current practice has been to run such services at low utilization to rein in latency outliers, which decreases efficiency and limits the number of service invocations developers can issue while still meeting tight latency budgets. In this paper, we analyze three data center applications, Memcached, OpenFlow, and Web search, to measure the effect of 1) kernel socket handling, NIC interaction, and the network stack, 2) application locks contested in the kernel, and 3) application-layer queueing due to requests being stalled behind straggler threads on tail latency. We propose Chronos, a framework to deliver predictable, low latency in data center applications. Chronos uses a combination of existing and new techniques to achieve this end, for example by supporting Memcached at 200,000 requests per second per server at mean latency of 10 μs with a 99th percentile latency of only 30 μs, a factor of 20 lower than baseline Memcached.