Algorithms for matrix transposition on Boolean N-cube configured ensemble architecture
SIAM Journal on Matrix Analysis and Applications
Design and performance of a scalable parallel community climate model
Parallel Computing - Special issue: climate and weather modeling
Parallel Algorithms for the Spectral Transform Method
SIAM Journal on Scientific Computing
Data organization and I/O in a parallel ocean circulation model
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Development of mixed mode MPI / OpenMP applications
Scientific Programming
Environmental Modelling & Software
An evaluation of MPI and OpenMP paradigms for multi-dimensional data remapping
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Simdist: a distribution system for easy parallelization of evolutionary computation
Genetic Programming and Evolvable Machines
Comparative analysis of OpenMP and MPI on multi-core architecture
Proceedings of the 44th Annual Simulation Symposium
Hi-index | 0.00 |
We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-place method using vacancy tracking cycles. The vacancy tracking algorithm outperforms the traditional 2-array method as demonstrated by extensive comparisons. The independence of vacancy tracking cycles allows efficient parallelization of the in-place method on SMP architectures at node level. Performance of multi-threaded parallelism using OpenMP are tested with different scheduling methods and different number of threads. The vacancy tracking method is parallelized using several parallel paradigms. At node level, pure OpenMP outperforms pure MPI by a factor of 2.76. Across entire cluster of SMP nodes, the hybrid MPI/OpenMP implementation outperforms pure MPI by a factor of 4.44, demonstrating the validity of the parallel paradigm of mixing MPI with OpenMP.