High performance computing using MPI and OpenMP on multi-core parallel systems

Authors:
Haoqiang Jin;Dennis Jespersen;Piyush Mehrotra;Rupak Biswas;Lei Huang;Barbara Chapman
Affiliations:
NAS Division, NASA Ames Research Center, Moffett Field, CA 94035, United States;NAS Division, NASA Ames Research Center, Moffett Field, CA 94035, United States;NAS Division, NASA Ames Research Center, Moffett Field, CA 94035, United States;NAS Division, NASA Ames Research Center, Moffett Field, CA 94035, United States;Department of Computer Sciences, University of Houston, Houston, TX 77004, United States;Department of Computer Sciences, University of Houston, Houston, TX 77004, United States
Venue:
Parallel Computing
Year:
2011

Citing 15
Cited 4

Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Extending OpenMP for NUMA machines

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting Data Locality on Scalable Shared Memory Machines with Data Parallel Programs

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Performance of a new CFD flow solver using a hybrid programming paradigm

Journal of Parallel and Distributed Computing
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Performance characteristics of the multi-zone NAS parallel benchmarks

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
An Approach To Data Distributions in Chapel

International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development

Computer
Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The role of MPI in development time: a case study

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
A practical study of UPC using the NAS Parallel Benchmarks

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Enabling locality-aware computations in OpenMP

Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism

Parallel FEM adaptation on hierarchical architectures

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Parallel partitioning for distributed systems using sequential assignment

Journal of Parallel and Distributed Computing
Evaluating the suitability of the EGM2008 geopotential model for the Korean peninsula using parallel computing on a diskless cluster

Computers & Geosciences
An integrated, programming model-driven framework for NoC-QoS support in cluster-based embedded many-cores

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.