T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A tightly-coupled processor-network interface
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Direct Cache Access for High Bandwidth Network I/O
Proceedings of the 32nd annual international symposium on Computer Architecture
TCP offload is a dumb idea whose time has come
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Lazy direct-to-cache transfer during receive operations in a message passing environment
Proceedings of the 3rd conference on Computing frontiers
Integrated network interfaces for high-bandwidth TCP/IP
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Evaluating network processing efficiency with processor partitioning and asynchronous I/O
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Protocol offload analysis by simulation
Journal of Systems Architecture: the EUROMICRO Journal
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
Microprocessors & Microsystems
DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Predictive-flow-queue-based energy optimization for gigabit ethernet controllers
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
WSEAS TRANSACTIONS on COMMUNICATIONS
The partition algorithm of an equivalent queuing model for serial-parallel connection servers
AIC'10/BEBI'10 Proceedings of the 10th WSEAS international conference on applied informatics and communications, and 3rd WSEAS international conference on Biomedical electronics and biomedical informatics
Proceedings of the 3rd International Conference on Future Energy Systems: Where Energy, Computing and Communication Meet
Analyzing performance and power efficiency of network processing over 10 GbE
Journal of Parallel and Distributed Computing
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Thin servers with smart pipes: designing SoC accelerators for memcached
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Current high-performance computer systems are unable to saturate the latest available high-bandwidth networks such as 10 Gigabit Ethernet. A key obstacle in achieving 10 gigabits per second is the high overhead of communication between the CPU and network interface controller (NIC), which typically resides on a standard I/O bus with high access latency. Using several network-intensive benchmarks, we investigate the impact of this overhead by analyzing the performance of hypothetical systems in which the NIC is more closely coupled to the CPU, including integration on the CPU die. We find that systems with high-latency NICs spend a significant amount of time in the device driver. NIC integration can substantially reduce this overhead, providing significant throughput benefits when other CPU processing is not a bottleneck. NIC integration also enables cache placement of DMA data. This feature has tremendous benefits when payloads are touched quickly, but potentially can harm performance in other situations due to cache pollution.