Performance Analysis of System Overheads in TCP/IP Workloads

Authors:
Nathan L. Binkert;Lisa R. Hsu;Ali G. Saidi;Ronald G. Dreslinski;Andrew L. Schultz;Steven K. Reinhardt
Affiliations:
Advanced Computer Architecuture Lab EECS Department, University of Michigan;Advanced Computer Architecuture Lab EECS Department, University of Michigan;Advanced Computer Architecuture Lab EECS Department, University of Michigan;Advanced Computer Architecuture Lab EECS Department, University of Michigan;Advanced Computer Architecuture Lab EECS Department, University of Michigan;Advanced Computer Architecuture Lab EECS Department, University of Michigan
Venue:
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Year:
2005

Citing 9
Cited 14

T: a multithreaded massively parallel architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A tightly-coupled processor-network interface

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Using the SimOS machine simulator to study complex computer systems

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Making Network Interfaces Less Peripheral

Computer
Simics: A Full System Simulation Platform

Computer
TCP Onloading for Data Center Servers

Computer
Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Direct Cache Access for High Bandwidth Network I/O

Proceedings of the 32nd annual international symposium on Computer Architecture
TCP offload is a dumb idea whose time has come

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9

Lazy direct-to-cache transfer during receive operations in a message passing environment

Proceedings of the 3rd conference on Computing frontiers
Integrated network interfaces for high-bandwidth TCP/IP

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Evaluating network processing efficiency with processor partitioning and asynchronous I/O

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Protocol offload analysis by simulation

Journal of Systems Architecture: the EUROMICRO Journal
A TCP offload engine emulator for estimating the impact of removing protocol processing from a host running Apache HTTP server

SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
Hiding message delivery latency using Direct-to-Cache-Transfer techniques in message passing environments

Microprocessors & Microsystems
Performance measurement and queueing analysis at medium-high blocking probability of parallel connection servers with identical service rates

DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Predictive-flow-queue-based energy optimization for gigabit ethernet controllers

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Performance measurement and queueing analysis at medium-high blocking probability of parallel connection servers with identical service rates

WSEAS TRANSACTIONS on COMMUNICATIONS
The partition algorithm of an equivalent queuing model for serial-parallel connection servers

AIC'10/BEBI'10 Proceedings of the 10th WSEAS international conference on applied informatics and communications, and 3rd WSEAS international conference on Biomedical electronics and biomedical informatics
DirectPath: high performance and energy efficient platform I/O architecture for content intensive usages

Proceedings of the 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet
Analyzing performance and power efficiency of network processing over 10 GbE

Journal of Parallel and Distributed Computing
Comparing direct-to-cache transfer policies to TCP/IP and M-VIA during receive operations in MPI environments

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Thin servers with smart pipes: designing SoC accelerators for memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current high-performance computer systems are unable to saturate the latest available high-bandwidth networks such as 10 Gigabit Ethernet. A key obstacle in achieving 10 gigabits per second is the high overhead of communication between the CPU and network interface controller (NIC), which typically resides on a standard I/O bus with high access latency. Using several network-intensive benchmarks, we investigate the impact of this overhead by analyzing the performance of hypothetical systems in which the NIC is more closely coupled to the CPU, including integration on the CPU die. We find that systems with high-latency NICs spend a significant amount of time in the device driver. NIC integration can substantially reduce this overhead, providing significant throughput benefits when other CPU processing is not a bottleneck. NIC integration also enables cache placement of DMA data. This feature has tremendous benefits when payloads are touched quickly, but potentially can harm performance in other situations due to cache pollution.