The design of nectar: a network backplane for heterogeneous multicomputers
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Threads and input/output in the synthesis kernal
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Network locality at the scale of processes
SIGCOMM '91 Proceedings of the conference on Communications architecture & protocols
Efficient demultiplexing of incoming TCP packets
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
IEEE/ACM Transactions on Networking (TON)
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Profiling and reducing processing overheads in TCP/IP
IEEE/ACM Transactions on Networking (TON)
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
A Case for NOW (Networks of Workstations)
IEEE Micro
The Virtual Interface Architecture
IEEE Micro
Can User-Level Protocols Take Advantage of Multi-CPU NICs?
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Firmware-Level Latency Analysis on a Gigabit Network
The Journal of Supercomputing
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Copy Emulation in Checksummed, Multiple-Packet Communication
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
An Efficient Programmable 10 Gigabit Ethernet Network Interface Card
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Architectural Characterization of TCP/IP Packet Processing on the Pentium® M Microprocessor
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Sockets Direct Protocol over InfiniBand in clusters: is it beneficial?
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
CSP: a novel system architecture for scalable internet and communication services
USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3
Latency analysis of TCP on an ATM network
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Cheating the I/O bottleneck: network storage with Trapeze/Myrinet
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Trapeze/IP: TCP/IP at near-gigabit speeds
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Invited Performance of the communication layers of TCP/IP with the Myrinet gigabit LAN
Computer Communications
End system optimizations for high-speed TCP
IEEE Communications Magazine
Hi-index | 0.00 |
Although extremely high-speed interconnects are available today, the traditional protocol stacks such as TCP/IP and UDP/IP are not able to utilize the maximum network bandwidth due to inherent overheads in the protocol stacks. Such overheads are a big obstacle for high-performance computing applications to exploit high-speed interconnects in cluster environments. To address this issue, many researchers have been presenting analyses of protocol overheads and suggesting a number of optimization approaches to harness the TCP/IP suite over high-speed interconnects. However, to the best of our knowledge, there is no study that analyzes and optimizes the protocol overheads thoroughly in an integrated manner. In this paper, we exploit a set of protocol optimization mechanisms in an integrated manner while dealing with the full spectrum of the protocol layers from the transport layer to the physical layer. To evaluate the impact of each protocol overhead, we apply the optimization mechanisms one by one and perform detailed analyses at each step. The thorough overhead measurements and analyses reveal the dependencies between protocol overheads. With our comprehensive optimizations, we show that UDP/IP can utilize more than 95% of the maximum network throughput a Myrinet-based experimental system can provide.