A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI

Authors:
Hongzhang Shan;Nicholas J. Wright;John Shalf;Katherine Yelick;Marcus Wagner;Nathan Wichmann
Affiliations:
CRD and NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;CRD and NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;CRD and NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;CRD and NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA;Cray Inc. 380 Jackson Street, Paul, MN;Cray Inc. 380 Jackson Street, Paul, MN
Venue:
ACM SIGMETRICS Performance Evaluation Review
Year:
2012

Citing 14
Cited 1

Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Co-array Fortran for Full and Sparse Matrices

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
UPC performance and potential: a NPB experimental study

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Experiences with Sweep3D implementations in Co-array Fortran

The Journal of Supercomputing
Parallel numerical algorithms based on tensor notation and Co-Array Fortran syntax

Parallel Computing
Application of Pfortran and Co-Array Fortran in the parallelization of the GROMOS96 molecular dynamics module

Scientific Programming
Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Sustained Petascale: The Next MPI Challenge

Proceedings of the 14th European PVM/MPI User's Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A new vision for coarray Fortran

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
A practical study of UPC using the NAS Parallel Benchmarks

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Optimizing bandwidth limited problems using one-sided communication and overlap

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A programming model performance study using the NAS parallel benchmarks

Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Multithreaded Global Address Space Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

A preliminary evaluation of the hardware acceleration of the cray gemini interconnect for PGAS languages and comparison with MPI

Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Gemini interconnect on the Cray XE6 platform provides for lightweight remote direct memory access (RDMA) between nodes, which is useful for implementing partitioned global address space (PGAS) languages like UPC and Co-Array Fortran. In this paper, we perform a study of Gemini performance using a set of communication microbenchmarks and compare the performance of one-sided communication in PGAS languages with two-sided MPI. Our results demonstrate the performance benefits of the PGAS model on Gemini hardware, showing in what circumstances and by how much one-sided communication outperforms two-sided in terms of messaging rate, aggregate bandwidth, and computation and communication overlap capability. For example, for 8-byte and 2KB messages the one-sided messaging rate is 5 and 10 times greater respectively than the twosided one. The study also reveals important information about how to optimize one-sided Gemini communication.