Software overhead in messaging layers: where does the time go?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Multicasting protocols for high-speed, wormhole-routing local area networks
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
Transposition table driven work scheduling in distributed search
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Soft timers: efficient microsecond software timer support for network processing
Proceedings of the seventeenth ACM symposium on Operating systems principles
User-space communication: a quantitative study
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Assessing Fast Network Interfaces
IEEE Micro
Optimal Multicast with Packetization and Network Interface Support
ICPP '97 Proceedings of the international Conference on Parallel Processing
Efficient Multicast on Myrinet using Link-Level Flow Control
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
PM: An Operating System Coordinated High Performance Communication Library
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Limits to the Performance of Software Shared Memory: A Layered Approach
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
(R) Efficient Reliable Multicast on MYRINET
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Source-level global optimizations for fine-grain distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
The distributed ASCI Supercomputer project
ACM SIGOPS Operating Systems Review
Efficient Java RMI for parallel programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Performance Analysis of Transposition-Table-Driven Work Scheduling in Distributed Search
IEEE Transactions on Parallel and Distributed Systems
EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A dynamic application-driven data communication strategy
Proceedings of the 18th annual international conference on Supercomputing
Cluster communication protocols for parallel-programming systems
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
We systematically evaluate the performance of five implementations of a single, user-level communication interface. Each implementation makes different architectural assumptions about the reliability of the network hardware and the capabilities of the network interface. The implementations differ accordingly in their division of protocol tasks between host software, network-interface firmware, and network hardware. Using microbenchmarks, parallel-programming systems, and parallel applications, we assess the performance impact of different protocol decompositions. We show how moving protocol tasks to a relatively slow network interface yields both performance advantages and disadvantages, depending on the characteristics of the application and the underlying parallel-programming system. In particular, we show that a communication system that assumes highly reliable network hardware and that uses network-interface support to process multicast traffic performs best for all applications.