Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A New DMA Registration Strategy for Pinning-Based High Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Proceedings of the 22nd annual international conference on Supercomputing
Scalable RDMA performance in PGAS languages
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Enabling concurrent multithreaded MPI communication on multicore petascale systems
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A high-productivity task-based programming model for clusters
Concurrency and Computation: Practice & Experience
Automatic communication coalescing for irregular computations in UPC language
CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
PGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed for scalability and performance portability, and it includes optimized implementations for LAPI and Blue Gene DCMF communication sub systems. This paper presents an optimized implementation of the IBM APGAS runtime for Myrinet networks, on top of the MX communication library. It explains the challenges of implementing a one-sided communication model (APGAS) on top of a two-sided communication API such as MX. We show that our implementation outperforms the Berkeley GASNet runtime in terms of latency and bandwidth. We also demonstrate scalability of various HPC benchmarks up to 1024 processes.