Performance characteristics of the multi-zone NAS parallel benchmarks
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Parallelization of a Vector-Optimized 3-D Flow Solver for Multi-core Node Clusters
HPCMP-UGC '10 Proceedings of the 2010 DoD High Performance Computing Modernization Program Users Group Conference
Hi-index | 0.00 |
Today most systems in high-performance computing (HPC) feature a hierarchical hardware design: shared-memory nodes with several multi-core CPUs are connected via a network infrastructure. When parallelizing an application for these architectures it seems natural to employ a hierarchical programming model such as combining MPI and OpenMP. Nevertheless, there is the general lore that pure MPI outperforms the hybrid MPI/OpenMP approach. In this paper, we describe the hybrid MPI/OpenMP parallelization of IR3D (Incompressible Realistic 3-D) code, a full-scale real-world application, which simulates the environmental effects on the evolution of vortices trailing behind control surfaces of underwater vehicles. We discuss performance, scalability and limitations of the pure MPI version of the code on a variety of hardware platforms and show how the hybrid approach can help to overcome certain limitations.