Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Authors:
Géraud Krawezik
Affiliations:
Université de Paris Sud, Orsay, France
Venue:
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2003

Citing 3
Cited 15

Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference

MPI: The Complete Reference

Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
Exploiting processor groups to extend scalability of the GA shared memory programming model

Proceedings of the 2nd conference on Computing frontiers
Performance prediction through simulation of a hybrid MPI/OpenMP application

Parallel Computing - OpenMp
High-scalability parallelization of a molecular modeling application: performance and productivity comparison between OpenMP and MPI implementations

International Journal of Parallel Programming
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Hybrid bulk synchronous parallelism library for clustered smp architectures

Proceedings of the fourth international workshop on High-level parallel programming and applications
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Performance and programmability comparison between OpenMP and MPI implementations of a molecular modeling application

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Comparative analysis of OpenMP and MPI on multi-core architecture

Proceedings of the 44th Annual Simulation Symposium
A framework for an automatic hybrid MPI+OpenMP code generation

Proceedings of the 19th High Performance Computing Symposia
Topology-Aware OpenMP process scheduling

IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Performance evaluation of OpenMP-based algorithms for handling Kronecker descriptors

Journal of Parallel and Distributed Computing
High performance computing using MPI and OpenMP on multi-core parallel systems

Parallel Computing
OpenMP parallelism for fluid and fluid-particulate systems

Parallel Computing
Multi-level parallelism for incompressible flow computations on GPU clusters

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a choice of a broad range of programming approaches.To help the programmer in his selection, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B) and two shared memory multiprocessors (IBM SP3 Night Hawk II, SGI Origin 3800). We also present a path from MPI to OpenMP SPMD guiding the programmers starting from an existing MPI code. We present the first SPMD OpenMP version of the NAS benchmark and compare it with other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared to MPI for a large set of experimental conditions. However the price of this performance is a strong programming effort on data set adaptation and inter-thread communications. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences.