Improving network connection locality on multicore systems

Authors:
Aleksey Pesterev;Jacob Strauss;Nickolai Zeldovich;Robert T. Morris
Affiliations:
MIT CSAIL, Boston, MA, USA;Quanta Research Cambridge, Boston, MA, USA;MIT CSAIL, Boston, MA, USA;MIT CSAIL, Boston, MA, USA
Venue:
Proceedings of the 7th ACM european conference on Computer Systems
Year:
2012

Citing 8
Cited 12

Networking support for large scale multiprocessor servers

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An evaluation of network stack parallelization strategies in modern operating systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Performance issues in parallelized network protocols

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
RouteBricks: exploiting parallelism to scale software routers

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Locating cache performance bottlenecks using data profiling

Proceedings of the 5th European conference on Computer systems
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
FlexSC: flexible system call scheduling with exception-less system calls

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
ServerSwitch: a programmable and high performance platform for data center networks

Proceedings of the 8th USENIX conference on Networked systems design and implementation

MegaPipe: a new programming interface for scalable network I/O

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Cache-aware affinitization on commodity multicores for high-speed network flows

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Comparison of caching strategies in modern cellular backhaul networks

Proceeding of the 11th annual international conference on Mobile systems, applications, and services
A scalable lock manager for multicores

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
We need to talk about NICs

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Rethinking network stack design with memory snapshots

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Scap: stream-oriented network traffic capture and analysis for high-speed networks

Proceedings of the 2013 conference on Internet measurement conference
vTurbo: accelerating virtual machine I/O processing using designated turbo-sliced core

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Network stack specialization for performance

Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks
MICA: a holistic approach to fast in-memory key-value storage

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
mTCP: a highly scalable user-level TCP stack for multicore systems

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incoming and outgoing processing for a given TCP connection often execute on different cores: an incoming packet is typically processed on the core that receives the interrupt, while outgoing data processing occurs on the core running the relevant user code. As a result, accesses to read/write connection state (such as TCP control blocks) often involve cache invalidations and data movement between cores' caches. These can take hundreds of processor cycles, enough to significantly reduce performance. We present a new design, called Affinity-Accept, that causes all processing for a given TCP connection to occur on the same core. Affinity-Accept arranges for the network interface to determine the core on which application processing for each new connection occurs, in a lightweight way; it adjusts the card's choices only in response to imbalances in CPU scheduling. Measurements show that for the Apache web server serving static files on a 48-core AMD system, Affinity-Accept reduces time spent in the TCP stack by 30% and improves overall throughput by 24%.