Hybrid parallel programming with MPI and unified parallel C

  • Authors:
  • James Dinan;Pavan Balaji;Ewing Lusk;P. Sadayappan;Rajeev Thakur

  • Affiliations:
  • The Ohio State University, Columbus, OH, USA;Argonne National Laboratory, Argonne, IL, USA;Argonne National Laboratory, Argonne, IL, USA;The Ohio State University, Columbus, OH, USA;Argonne National Laboratory, Argonne, IL, USA

  • Venue:
  • Proceedings of the 7th ACM international conference on Computing frontiers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.