Experience with active messages on the Meiko CS-2

Authors:
Klaus E. Schauser;Chris J. Scheiman
Affiliations:
-;-
Venue:
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Year:
1995

Citing 0
Cited 17

LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Towards modeling the performance of a fast connected components algorithm on parallel machines

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Efficient support of location transparency in concurrent object-oriented programming languages

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Evaluation of architectural support for global address-based communication in large-scale parallel machines

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Run-time compilation for parallel sparse matrix computations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Performance implications of communication mechanisms in all-software global address space systems

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Space/time-efficient scheduling and execution of parallel irregular computations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Design challenges of virtual networks: fast, general-purpose communication

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Profiling a parallel language based on fine-grained communication

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Low-latency communication on the IBM RISC system/6000 SP

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Efficient Run-Time Support for Irregular Task Computations with Mixed Granularities

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Exploiting the Capabilities of Communications Co-Processors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Low Latency MPI for Meiko CS/2 and ATM Clusters

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Optimizing Parallel Bitonic Sort

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Level-wise scheduling algorithm for fat tree interconnection networks

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. During our work we have identified that architectures which only support efficient remote write operations (or DMA transfers as in the case of the CS-2) make it difficult to transfer both data and control as required by active messages. Traditional network interfaces avoid this problem because they have a single point of entry which essentially acts as a queue. To efficiently support active messages on modern network communication co-processors, hardware primitives are required which support this queue behavior The overcame this problem by producing specialized code which runs on the communications co-processor and supports the active messages protocol. Our implementation of active messages results in a one-way latency of 12.3 /spl mu/s and achieves up to 39 MB/s for bulk transfers. Both numbers are close to optimal for the current Meiko hardware and are competitive with performance of active messages on other hardware platforms.