Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap

Authors:
Rajesh Nishtala;Paul H. Hargrove;Dan O. Bonachea;Katherine A. Yelick
Affiliations:
Computer Science Division, College of Engineering, University of California at Berkeley, USA;High Performance Computing Research Department, Lawrence Berkeley National Laboratory, CA, USA;Computer Science Division, College of Engineering, University of California at Berkeley, USA;Computer Science Division, College of Engineering, University of California at Berkeley, USA
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 12

Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+

Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Optimizing UPC programs for multi-core systems

Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
SHMEM+: A multilevel-PGAS programming model for reconfigurable supercomputing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scaling scientific applications on clusters of hybrid multicore/GPU nodes

Proceedings of the 8th ACM International Conference on Computing Frontiers
Unifying UPC and MPI runtimes: experience with MVAPICH

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Optimizing the Barnes-Hut algorithm in UPC

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Tuning collective communication for Partitioned Global Address Space programming models

Parallel Computing
A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI

ACM SIGMETRICS Performance Evaluation Review
Compass: a scalable simulator for an architecture for cognitive computing

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication avoiding and overlapping for numerical linear algebra

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Performance evaluation of sparse matrix products in UPC

The Journal of Supercomputing
Enabling highly-scalable remote memory access programming with MPI-3 one sided

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In earlier work, we showed that the one-sided communication model found in PGAS languages (such as UPC) offers significant advantages in communication efficiency by decoupling data transfer from processor synchronization. We explore the use of the PGAS model on IBM BlueGene/P, an architecture that combines low-power, quad-core processors with extreme scalability. We demonstrate that the PGAS model, using a new port of the Berkeley UPC compiler and GASNet one-sided communication layer, outperforms two-sided (MPI) communication in both microbenchmarks and a case study of the communication-limited benchmark, NAS FT. We scale the benchmark up to 16,384 cores of the BlueGene/P and demonstrate that UPC consistently outperforms MPI by as much as 66% for some processor configurations and an average of 32%. In addition, the results demonstrate the scalability of the PGAS model and the Berkeley implementation of UPC, the viability of using it on machines with multicore nodes, and the effectiveness of the BG/P communication layer for supporting one-sided communication and PGAS languages.