High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet

Authors:
Scott Pakin;Mario Lauria;Andrew Chien
Affiliations:
University of Illinois at Urbana-Champaign;Università di Napoli "Federico II";University of Illinois at Urbana-Champaign
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 0
Cited 37

Improving the Throughput of Remote Storage Access through Pipelining

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Optimal Multicast with Packetization and Network Interface Support

ICPP '97 Proceedings of the international Conference on Parallel Processing
Using Programmable NICs for Time-Warp Optimization

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Incorporating Quality-of-Service in the Virtual Interface Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Exploiting the Capabilities of Communications Co-Processors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Platform-Independent Runtime Optimizations Using OpenThreads

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
CMC: A Coscheduling Model for non-Dedicated Cluster Computing

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Priority Based Messaging for Software Distributed Shared Memory

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The MultiCluster Model to the Integrated Use of Multiple Workstation Clusters

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
An Efficient and Scalable Coscheduling Technique for Large Symmetric Multiprocessor Clusters

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
VIA Communication Performance on a Gigabit Ethernet Cluster

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Converse: An Interoperable Framework for Parallel Programming

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Asynchronous MPI messaging on Myrinet

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Integrating polling, interrupts, and thread management

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Heterogeneous Distributed Virtual Machines in the Harness Metacomputing Framework

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Communication Modeling of Heterogeneous Networks of Workstations for Performance Characterization of Collective Operations

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Shared Memory NUMA Programming on I-WAY

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Automatic exploitation of dual level parallelism on a network of multiprocessors

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Network Co-processor-Based Approach to Scalable Media Streaming in Servers

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Optimizing Parallel Applications for Wide-Area Clusters

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Network Technologies

International Journal of High Performance Computing Applications
Optimization of MPI collective communication on BlueGene/L systems

Proceedings of the 19th annual international conference on Supercomputing
Short note: Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

Journal of Parallel and Distributed Computing
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Implementation and performance study of a hardware-VIA-based network adapter on gigabit ethernet

Journal of Systems Architecture: the EUROMICRO Journal
MMR: A MultiMedia Router architecture to support hybrid workloads

Journal of Parallel and Distributed Computing
Software distributed shared memory over virtual interface architecture: implementation and performance

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Porting a user-level communication architecture to NT: experiences and performance

WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development
Blue Gene/L performance tools

IBM Journal of Research and Development
The ParaStation project: Using workstations as building blocks for parallel computing

Information Sciences: an International Journal
ToCL: a thread oriented communication library to interface VIA and GM protocols

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Research: Characterizing and scheduling communication interactions of parallel and local jobs on networks of workstations

Computer Communications
NetSlices: scalable multi-core packet processing in user-space

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Scale-out NUMA

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In most computer systems, software overhead dominates the cost ofmessaging, reducing delivered performance, especially for shortmessages. Efficient software messaging layers are needed to deliverthe hardware performance to the application level and to supporttightly-coupled workstation clusters. Illinois Fast Messages (FM)1.0 is a high speed messaging layer that delivers low latency andhigh bandwidth for short messages. For 128-byte packets, FMachieves bandwidths of 16.2MB/s and one-way latencies 32 µson Myrinet-connected SPARCstations (user-level to user-level). Forshorter packets, we have measured one-way latencies of 25 µs,and for larger packets, bandwidth as high as to 19.6MB/s -delivered bandwidth greater than OC-3. FM is also superior to theMyrinet API messaging layer, not just in terms of latency andusable bandwidth, but also in terms of the message half-power point(n_{\frac{1}{2}}), which is two orders of magnitude smaller (54 vs.4,409 bytes). We describe the FM messaging primitives and thecritical design issues in building a low-latency messaging layersfor workstation clusters. Several issues are critical: the divisionof labor between host and network coprocessor, management of theinput/output (I/O) bus, and buffer management. To achieve highperformance, messaging layers should assign as much functionalityas possible to the host. If the network interface has DMAcapability, the I/Obus should be used asymmetrically, with the hostprocessor moving data to the network and exploiting DMA to movedata to the host. Finally, buffer management should be extremelysimple in the network coprocessor and match queue structuresbetween the network coprocessor and host memory. Detailedmeasurements show how each of these features contribute to highperformance.