Hybrid parallel programming with MPI and unified parallel C

Authors:
James Dinan;Pavan Balaji;Ewing Lusk;P. Sadayappan;Rajeev Thakur
Affiliations:
The Ohio State University, Columbus, OH, USA;Argonne National Laboratory, Argonne, IL, USA;Argonne National Laboratory, Argonne, IL, USA;The Ohio State University, Columbus, OH, USA;Argonne National Laboratory, Argonne, IL, USA
Venue:
Proceedings of the 7th ACM international conference on Computing frontiers
Year:
2010

Citing 12
Cited 7

ScaLAPACK user's guide

ScaLAPACK user's guide
Global arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
Dynamic Load Balancing of MPI+OpenMP Applications

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Parallel Programming and Parallel Abstractions in Fortress

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Development of mixed mode MPI / OpenMP applications

Scientific Programming
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Investigating High Performance RMA Interfaces for the MPI-3 Standard

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing

Efficient data race detection for distributed memory parallel programs

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing the Barnes-Hut algorithm in UPC

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Exploring cross-layer power management for PGAS applications on the SCC platform

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Towards autotuning by alternating communication methods

ACM SIGMETRICS Performance Evaluation Review
Enabling MPI interoperability through flexible communication endpoints

Proceedings of the 20th European MPI Users' Group Meeting
A classification of scientific visualization algorithms for massive threading

UltraVis '13 Proceedings of the 8th International Workshop on Ultrascale Visualization
Portable, MPI-interoperable coarray fortran

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.