Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fibre Channel Fabrics: Evaluation and Design
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
The Quadrics Network (QsNet): High-Performance Clustering Technology
HOTI '01 Proceedings of the The Ninth Symposium on High Performance Interconnects
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
Architectural Characterization of TCP/IP Packet Processing on the Pentium® M Microprocessor
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Understanding The Linux Kernel
Understanding The Linux Kernel
Performance Analysis of System Overheads in TCP/IP Workloads
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Integrated network interfaces for high-bandwidth TCP/IP
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Hardware Support for Accelerating Data Movement in Server Platform
IEEE Transactions on Computers
The slab allocator: an object-caching kernel memory allocator
USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Impact of Cache Coherence Protocols on the Processing of Network Traffic
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Performance Measurement of an Integrated NIC Architecture with 10GbE
HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
Energy proportional datacenter networks
Proceedings of the 37th annual international symposium on Computer architecture
Hi-index | 0.00 |
Ethernet continues to be the most widely used network architecture today for its low cost and backward compatibility with the existing Ethernet infrastructure. Driven by increasing networking demands of cloud workloads, network speed rapidly migrates from 1 to 10 Gbps and beyond. Ethernet's ubiquity and its continuously increasing rate motivate us to fully understand high speed network processing performance and its power efficiency. In this paper, we begin with per-packet processing overhead breakdown on Intel Xeon servers with 10 GbE networking. We find that besides data copy, the driver and buffer release, unexpectedly take 46% of the processing time for large I/O sizes and even 54% for small I/O sizes. To further understand the overheads, we manually instrument the 10 GbE NIC driver and OS kernel along the packet processing path using hardware performance counters (PMU). Our fine-grained instrumentation pinpoints the performance bottlenecks, which were never reported before. In addition to detailed performance analysis, we also examine power consumption of network processing over 10 GbE by using a power analyzer. Then, we use an external Data Acquisition System (DAQ) to obtain a breakdown of power consumption for individual hardware components such as CPU, memory and NIC, and obtain several interesting observations. Our detailed performance and power analysis guides us to design a more processing- and power-efficient server I/O architecture for high speed networks.