On the utility of threads for data parallel programming
ICS '95 Proceedings of the 9th international conference on Supercomputing
On the utility of communication-computation overlap in data-parallel programs
Journal of Parallel and Distributed Computing
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
SPMD OpenMP versus MPI on a IBM SMP for 3 Kernels of the NAS Benchmarks
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
SKaMPI: A Detailed, Accurate MPI Benchmark
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Multi-level parallelism for incompressible flow computations on GPU clusters
Parallel Computing
Hi-index | 0.00 |
With the trend in the supercomputing world shifting from homogeneous machine architectures to hybrid clusters of SMP nodes, the interoperabiility of OpenMP and MPI has become a key issue in understanding and optimizing the overall system performance. While the low-level performance of MPI and OpenMP can be evaluated using existing benchmarks, the combination of the two poses new challenges. Therefore, a performance study of different hybrid programming paradigms is of high benefit for both the vendors and the user community. As part of our project, we have identified several possible combinations of the two models in order to provide qualitative and quantitative justification of situations in which any one of them is to be favoured. Collective operations are particularly important to analyze and evaluate on a hybrid platform and therefore we concentrate our study on three of them -- barrier, all-to-all, and all-reduce. Issues like the optimal mix of OpenMP and MPI, the most efficient way of managing MPI communication from within OpenMP, the optimal unit of communication, and the degree of overlap between computation and communication need to be evaluated. The performance results supporting this investigation were taken on the IBM Power-3 machine at San Diego Supecomputer Center using our suite of hybrid microbenchmarks.