Scheduler activations: effective kernel support for the user-level management of parallelism
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software overhead in messaging layers: where does the time go?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
UTLB: a mechanism for address translation on network interfaces
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The effects of communication parameters on end performance of shared virtual memory clusters
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
FM-QoS: real-time communication using self-synchronizing schedules
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Assessing Fast Network Interfaces
IEEE Micro
Overview of memory channel network for PCI
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
ATM and Fast Ethernet Network Interfaces for User-level Communication
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Exploiting Two-Case Delivery for Fast Protected Messaging
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Address Translation Mechanisms In Network Interfaces
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Active Message Applications Programming Interface
Active Message Applications Programming Interface
Efficient connection-oriented communication on high-performance networks
Efficient connection-oriented communication on high-performance networks
Improving the performance of shared virtual memory on system area networks
Improving the performance of shared virtual memory on system area networks
High-performance local area communication with fast sockets
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
BOS is boss: a case for bulk-synchronous object systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
An efficient communication architecture for commodity supercomputers
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Evaluating design alternatives for reliable communication on high-speed networks
ACM SIGPLAN Notices
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
Evaluating design alternatives for reliable communication on high-speed networks
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
QoS provisioning in clusters: an investigation of Router and NIC design
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Semi-User-Level Communication Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
PCI-DDC Application Programming Interface: Performance in User-Level Messaging (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Cluster communication protocols for parallel-programming systems
ACM Transactions on Computer Systems (TOCS)
PRESS: A Clustered Server Based on User-Level Communication
IEEE Transactions on Parallel and Distributed Systems
Impact of Page Size on Communication Performance
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
Efficient remote block-level I/O over an RDMA-capable NIC
Proceedings of the 20th annual international conference on Supercomputing
The Design and Implementation of a Domain-Specific Language for Network Performance Testing
IEEE Transactions on Parallel and Distributed Systems
Towards 100 gbit/s ethernet: multicore-based parallel communication protocol design
Proceedings of the 23rd international conference on Supercomputing
Evaluation of compound system calls in the Linux kernel
ACM SIGOPS Operating Systems Review
Operating system support for multimedia systems
Computer Communications
RDMA in the SiCortex cluster systems
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
Powerful commodity systems and networks offer a promising direction for high performance computing because they are inexpensive and they closely track technology progress. However, high, raw-hardware performance is rarely delivered to the end user. Previous work has shown that the bottleneck in these architectures is the overheads imposed by the software communication layer. To reduce these overheads, researchers have proposed a number of user-space communication models. The common feature of these models is that applications have direct access to the network, bypassing the operating system in the common case and thus avoiding the cost of send/receive system calls.In this paper we examine five user-space communication layers, that represent different points in the configuration space: Generic AM, BIP-0.92, FM-2.02, PM-1.2, and VMMC-2. Although these systems support different communication paradigms and employ a variety of different implementation tradeoffs, we are able to quantitatively compare them on a single testbed consisting of a cluster of high-end PCs connected by a Myrinet network.We find that all five communication systems have very low latency for small messages, in the range of 5 to 17 µs. Not surprisingly, this range is strongly influenced by the functionality offered by each system. We are encouraged, however, to find that features such as protected and reliable communication at user level and multiprogramming can be provided at very low cost. Bandwidth, however, depends primarily on how data is transferred between host memory and the network. Most of the investigated libraries support zero-copy protocols for certain types of data transfers, but differ significantly in the bandwidth delivered to end users. The highest bandwidth, between 95 and 125 MBytes/s for long message transfers, is delivered by libraries that use DMA on both send and receive sides and avoid all data copies. Libraries that perform additional data copies or use programmed I/O to send data to the network achieve lower maximum bandwidth, in the range of 60-70 MBytes/s.