A new server I/O architecture for high speed networks

Authors:
Guangdeng Liao;Xia Znu;Laxmi Bnuyan
Affiliations:
University of California, Riverside;Intel Labs;University of California, Riverside
Venue:
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Year:
2011

Citing 0
Cited 7

ReNIC: Architectural extension to SR-IOV I/O virtualization for efficient replication

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
High performance network virtualization with SR-IOV

Journal of Parallel and Distributed Computing
Composable thermal modeling and simulation for architecture-level thermal designs of multicore microprocessors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Rethinking network stack design with memory snapshots

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Improving server application performance via pure TCP ACK receive optimization

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
High-Performance network traffic processing systems using commodity hardware

DataTraffic Monitoring and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional architectural designs are normally focused on CPUs and have been often decoupled from I/O considerations. They are inefficient for high-speed network processing with a bandwidth of 10Gbps and beyond. Long latency I/O interconnects on mainstream servers also substantially complicate the NIC designs. In this paper, we start with fine-grained driver and OS instrumentation to fully understand the network processing overhead over 10GbE on mainstream servers. We obtain several new findings: 1) besides data copy identified by previous works, the driver and buffer release are two unexpected major overheads (up to 54%); 2) the major source of the overheads is memory stalls and data relating to socket buffer (SKB) and page data structures are mainly responsible for the stalls; 3) prevailing platform optimizations like Direct Cache Access (DCA) are insufficient for addressing the network processing bottlenecks. Motivated by the studies, we propose a new server I/O architecture where DMA descriptor management is shifted from NICs to an on-chip network engine (NEngine), and descriptors are extended with information about data incurring memory stalls. NEngine relies on data lookups and preloads data to eliminate the stalls during network processing. Moreover, NEngine implements efficient packet movement inside caches to address the remaining issues in data copy. The new architecture allows DMA engine to have very fast access to descriptors and keeps packets in CPU caches instead of NIC buffers, significantly simplifying NICs. Experimental results demonstrate that the new server I/O architecture improves the network processing efficiency by 47% and web server throughput by 14%, while substantially reducing the NIC hardware complexity.