Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Co-array Fortran for Full and Sparse Matrices
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Experiences with Sweep3D implementations in Co-array Fortran
The Journal of Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Sustained Petascale: The Next MPI Challenge
Proceedings of the 14th European PVM/MPI User's Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A new vision for coarray Fortran
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
A practical study of UPC using the NAS Parallel Benchmarks
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A programming model performance study using the NAS parallel benchmarks
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Hi-index | 0.00 |
The Gemini interconnect on the Cray XE6 platform provides for lightweight remote direct memory access (RDMA) between nodes, which is useful for implementing partitioned global address space (PGAS) languages like UPC and Co-Array Fortran. In this paper, we perform a study of Gemini performance using a set of communication microbenchmarks and compare the performance of one-sided communication in PGAS languages with two-sided MPI. Our results demonstrate the performance benefits of the PGAS model on Gemini hardware, showing in what circumstances and by how much one-sided communication outperforms two-sided in terms of messaging rate, aggregate bandwidth, and computation and communication overlap capability. For example, for 8-byte and 2KB messages the one-sided messaging rate is 5 and 10 times greater respectively than the twosided one. The study also reveals important information about how to optimize one-sided Gemini communication.