Exploiting processor groups to extend scalability of the GA shared memory programming model
Proceedings of the 2nd conference on Computing frontiers
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Exploiting multilevel parallelism using processor groups is becoming increasingly important for programming high-end systems. This paper describes a group-aware run-time support for shared-/global- address space programming models. The current effort has been undertaken in the context of the Aggregate Remote Memory Copy Interface (ARMCI) [1], a portable runtime system used as a communication layer for Global Arrays [2], Co-Array Fortran (CAF) [3], GPSHMEM [4], Co-Array Python [5], and also end-user applications. The paper describes the management of shared memory, integration of shared memory communication and remote direct memory access (RDMA) on clusters with SMP nodes, and registration. These are all required for efficient multi- method and multi-protocol communication on modern systems. Focus is placed on techniques for supporting process groups while maximizing communication performance and efficiently managing global memory system-wide.