User-space communication: a quantitative study

Authors:
Soichiro Araki;Angelos Bilas;Cezary Dubnicki;Jan Edler;Koichi Konishi;James Philbin
Affiliations:
NEC Corp., Kawasaki, Japan;Princeton University, Princeton, New Jersey;Princeton University, Princeton, New Jersey;NEC Research Institute, Inc., Princeton, New Jersey;NEC Corp., Kawasaki, Japan;NEC Research Institute, Inc., Princeton, New Jersey
Venue:
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Year:
1998

Citing 19
Cited 19

Scheduler activations: effective kernel support for the user-level management of parallelism

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software overhead in messaging layers: where does the time go?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
System area network mapping

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
UTLB: a mechanism for address translation on network interfaces

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The effects of communication parameters on end performance of shared virtual memory clusters

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
FM-QoS: real-time communication using self-synchronizing schedules

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Assessing Fast Network Interfaces

IEEE Micro
Overview of memory channel network for PCI

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
ATM and Fast Ethernet Network Interfaces for User-level Communication

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Exploiting Two-Case Delivery for Fast Protected Messaging

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Address Translation Mechanisms In Network Interfaces

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Active Message Applications Programming Interface

Active Message Applications Programming Interface
Efficient connection-oriented communication on high-performance networks

Efficient connection-oriented communication on high-performance networks
Improving the performance of shared virtual memory on system area networks

Improving the performance of shared virtual memory on system area networks
High-performance local area communication with fast sockets

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference

Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
BOS is boss: a case for bulk-synchronous object systems

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
An efficient communication architecture for commodity supercomputers

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Evaluating design alternatives for reliable communication on high-speed networks

ACM SIGPLAN Notices
Accelerating shared virtual memory via general-purpose network interface support

ACM Transactions on Computer Systems (TOCS)
Evaluating design alternatives for reliable communication on high-speed networks

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
QoS provisioning in clusters: an investigation of Router and NIC design

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Semi-User-Level Communication Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
PCI-DDC Application Programming Interface: Performance in User-Level Messaging (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Cluster communication protocols for parallel-programming systems

ACM Transactions on Computer Systems (TOCS)
PRESS: A Clustered Server Based on User-Level Communication

IEEE Transactions on Parallel and Distributed Systems
Impact of Page Size on Communication Performance

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
Efficient remote block-level I/O over an RDMA-capable NIC

Proceedings of the 20th annual international conference on Supercomputing
The Design and Implementation of a Domain-Specific Language for Network Performance Testing

IEEE Transactions on Parallel and Distributed Systems
Towards 100 gbit/s ethernet: multicore-based parallel communication protocol design

Proceedings of the 23rd international conference on Supercomputing
Evaluation of compound system calls in the Linux kernel

ACM SIGOPS Operating Systems Review
Operating system support for multimedia systems

Computer Communications
RDMA in the SiCortex cluster systems

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

Powerful commodity systems and networks offer a promising direction for high performance computing because they are inexpensive and they closely track technology progress. However, high, raw-hardware performance is rarely delivered to the end user. Previous work has shown that the bottleneck in these architectures is the overheads imposed by the software communication layer. To reduce these overheads, researchers have proposed a number of user-space communication models. The common feature of these models is that applications have direct access to the network, bypassing the operating system in the common case and thus avoiding the cost of send/receive system calls.In this paper we examine five user-space communication layers, that represent different points in the configuration space: Generic AM, BIP-0.92, FM-2.02, PM-1.2, and VMMC-2. Although these systems support different communication paradigms and employ a variety of different implementation tradeoffs, we are able to quantitatively compare them on a single testbed consisting of a cluster of high-end PCs connected by a Myrinet network.We find that all five communication systems have very low latency for small messages, in the range of 5 to 17 µs. Not surprisingly, this range is strongly influenced by the functionality offered by each system. We are encouraged, however, to find that features such as protected and reliable communication at user level and multiprogramming can be provided at very low cost. Bandwidth, however, depends primarily on how data is transferred between host memory and the network. Most of the investigated libraries support zero-copy protocols for certain types of data transfers, but differ significantly in the bandwidth delivered to end users. The highest bandwidth, between 95 and 125 MBytes/s for long message transfers, is delivered by libraries that use DMA on both send and receive sides and avoid all data copies. Libraries that perform additional data copies or use programmed I/O to send data to the network achieve lower maximum bandwidth, in the range of 60-70 MBytes/s.