Performance scalability of a multi-core web server
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
NIC-Assisted Cache-Efficient Receive Stack for Message Passing over Ethernet
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
SIP server performance on multicore systems
IBM Journal of Research and Development
IsoStack: highly efficient network processing on dedicated cores
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Exploiting MISD performance opportunities in multi-core systems
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Improving IPS by network processors
The Journal of Supercomputing
E-AHRW: An Energy-Efficient Adaptive Hash Scheduler for Stream Processing on Multi-core Servers
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
A case for RDMA in clouds: turning supercomputer networking into commodity
Proceedings of the Second Asia-Pacific Workshop on Systems
Improving network connection locality on multicore systems
Proceedings of the 7th ACM european conference on Computer Systems
MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
On the core affinity and file upload performance of Hadoop
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hi-index | 0.00 |
As technology trends push future microprocessors toward chip multiprocessor designs, operating system network stacks must be parallelized in order to keep pace with improvements in network bandwidth. There are two competing strategies for stack parallelization. Message-parallel network stacks use concurrent threads to carry out network operations on independent messages (usually packets), whereas connection-parallel stacks map operations to groups of connections and permit concurrent processing on independent connection groups. Connection-parallel stacks can use either locks or threads to serialize access to connection groups. This paper evaluates these parallel stack organizations using a modern operating system and chip multiprocessor hardware. Compared to uniprocessor kernels, all parallel stack organizations incur additional locking overhead, cache inefficiencies, and scheduling overhead. However, the organizations balance these limitations differently, leading to variations in peak performance and connection scalability. Lock-serialized connection-parallel organizations reduce the locking overhead of message-parallel organizations by using many connection groups and eliminate the expensive thread handoff mechanism of thread-serialized connection-parallel organizations. The resultant organization outperforms the others, delivering 5.4 Gb/s of TCP throughput for most connection loads and providing a 126% throughput improvement versus a uniprocessor for the] heaviest connection loads.