An evaluation of network stack parallelization strategies in modern operating systems

Authors:
Paul Willmann;Scott Rixner;Alan L. Cox
Affiliations:
Rice University;Rice University;Rice University
Venue:
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Year:
2006

Citing 0
Cited 11

Performance scalability of a multi-core web server

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
NIC-Assisted Cache-Efficient Receive Stack for Message Passing over Ethernet

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
SIP server performance on multicore systems

IBM Journal of Research and Development
IsoStack: highly efficient network processing on dedicated cores

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Exploiting MISD performance opportunities in multi-core systems

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Improving IPS by network processors

The Journal of Supercomputing
E-AHRW: An Energy-Efficient Adaptive Hash Scheduler for Stream Processing on Multi-core Servers

Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
A case for RDMA in clouds: turning supercomputer networking into commodity

Proceedings of the Second Asia-Pacific Workshop on Systems
Improving network connection locality on multicore systems

Proceedings of the 7th ACM european conference on Computer Systems
Analysis of event processing design patterns and their performance dependency on i/o notification mechanisms

MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
On the core affinity and file upload performance of Hadoop

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As technology trends push future microprocessors toward chip multiprocessor designs, operating system network stacks must be parallelized in order to keep pace with improvements in network bandwidth. There are two competing strategies for stack parallelization. Message-parallel network stacks use concurrent threads to carry out network operations on independent messages (usually packets), whereas connection-parallel stacks map operations to groups of connections and permit concurrent processing on independent connection groups. Connection-parallel stacks can use either locks or threads to serialize access to connection groups. This paper evaluates these parallel stack organizations using a modern operating system and chip multiprocessor hardware. Compared to uniprocessor kernels, all parallel stack organizations incur additional locking overhead, cache inefficiencies, and scheduling overhead. However, the organizations balance these limitations differently, leading to variations in peak performance and connection scalability. Lock-serialized connection-parallel organizations reduce the locking overhead of message-parallel organizations by using many connection groups and eliminate the expensive thread handoff mechanism of thread-serialized connection-parallel organizations. The resultant organization outperforms the others, delivering 5.4 Gb/s of TCP throughput for most connection loads and providing a 126% throughput improvement versus a uniprocessor for the] heaviest connection loads.