High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet

  • Authors:
  • Scott Pakin;Mario Lauria;Andrew Chien

  • Affiliations:
  • University of Illinois at Urbana-Champaign;Università di Napoli "Federico II";University of Illinois at Urbana-Champaign

  • Venue:
  • Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

In most computer systems, software overhead dominates the cost ofmessaging, reducing delivered performance, especially for shortmessages. Efficient software messaging layers are needed to deliverthe hardware performance to the application level and to supporttightly-coupled workstation clusters. Illinois Fast Messages (FM)1.0 is a high speed messaging layer that delivers low latency andhigh bandwidth for short messages. For 128-byte packets, FMachieves bandwidths of 16.2MB/s and one-way latencies 32 µson Myrinet-connected SPARCstations (user-level to user-level). Forshorter packets, we have measured one-way latencies of 25 µs,and for larger packets, bandwidth as high as to 19.6MB/s -delivered bandwidth greater than OC-3. FM is also superior to theMyrinet API messaging layer, not just in terms of latency andusable bandwidth, but also in terms of the message half-power point(n_{\frac{1}{2}}), which is two orders of magnitude smaller (54 vs.4,409 bytes). We describe the FM messaging primitives and thecritical design issues in building a low-latency messaging layersfor workstation clusters. Several issues are critical: the divisionof labor between host and network coprocessor, management of theinput/output (I/O) bus, and buffer management. To achieve highperformance, messaging layers should assign as much functionalityas possible to the host. If the network interface has DMAcapability, the I/Obus should be used asymmetrically, with the hostprocessor moving data to the network and exploiting DMA to movedata to the host. Finally, buffer management should be extremelysimple in the network coprocessor and match queue structuresbetween the network coprocessor and host memory. Detailedmeasurements show how each of these features contribute to highperformance.