Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
Exploiting processor groups to extend scalability of the GA shared memory programming model
Proceedings of the 2nd conference on Computing frontiers
Performance prediction through simulation of a hybrid MPI/OpenMP application
Parallel Computing - OpenMp
International Journal of Parallel Programming
Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Hybrid bulk synchronous parallelism library for clustered smp architectures
Proceedings of the fourth international workshop on High-level parallel programming and applications
A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Comparative analysis of OpenMP and MPI on multi-core architecture
Proceedings of the 44th Annual Simulation Symposium
A framework for an automatic hybrid MPI+OpenMP code generation
Proceedings of the 19th High Performance Computing Symposia
Topology-Aware OpenMP process scheduling
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Performance evaluation of OpenMP-based algorithms for handling Kronecker descriptors
Journal of Parallel and Distributed Computing
OpenMP parallelism for fluid and fluid-particulate systems
Parallel Computing
Multi-level parallelism for incompressible flow computations on GPU clusters
Parallel Computing
Hi-index | 0.00 |
When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a choice of a broad range of programming approaches.To help the programmer in his selection, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B) and two shared memory multiprocessors (IBM SP3 Night Hawk II, SGI Origin 3800). We also present a path from MPI to OpenMP SPMD guiding the programmers starting from an existing MPI code. We present the first SPMD OpenMP version of the NAS benchmark and compare it with other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared to MPI for a large set of experimental conditions. However the price of this performance is a strong programming effort on data set adaptation and inter-thread communications. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences.