Performance evaluation of hybrid parallel programming paradigms

Authors:
Achal Prabhakar;Vladimir Getov
Affiliations:
Performance and Architecture Lab, CCS-3, LANL, New Mexico and Department of Computer Science, University of Houston, Texas;Performance and Architecture Lab, CCS-3, LANL, New Mexico and School of Computer Science, University of Westminster, London, UK
Venue:
Performance analysis and grid computing
Year:
2004

Citing 5
Cited 1

On the utility of threads for data parallel programming

ICS '95 Proceedings of the 9th international conference on Supercomputing
On the utility of communication-computation overlap in data-parallel programs

Journal of Parallel and Distributed Computing
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
SPMD OpenMP versus MPI on a IBM SMP for 3 Kernels of the NAS Benchmarks

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
SKaMPI: A Detailed, Accurate MPI Benchmark

Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Multi-level parallelism for incompressible flow computations on GPU clusters

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the trend in the supercomputing world shifting from homogeneous machine architectures to hybrid clusters of SMP nodes, the interoperabiility of OpenMP and MPI has become a key issue in understanding and optimizing the overall system performance. While the low-level performance of MPI and OpenMP can be evaluated using existing benchmarks, the combination of the two poses new challenges. Therefore, a performance study of different hybrid programming paradigms is of high benefit for both the vendors and the user community. As part of our project, we have identified several possible combinations of the two models in order to provide qualitative and quantitative justification of situations in which any one of them is to be favoured. Collective operations are particularly important to analyze and evaluate on a hybrid platform and therefore we concentrate our study on three of them -- barrier, all-to-all, and all-reduce. Issues like the optimal mix of OpenMP and MPI, the most efficient way of managing MPI communication from within OpenMP, the optimal unit of communication, and the degree of overlap between computation and communication need to be evaluated. The performance results supporting this investigation were taken on the IBM Power-3 machine at San Diego Supecomputer Center using our suite of hybrid microbenchmarks.