Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
A Survey of Recoverable Distributed Shared Virtual Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Journal of Parallel and Distributed Computing
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Terascale spectral element dynamical core for atmospheric general circulation models
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Introduction to Parallel Computing
Introduction to Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time
IEEE Transactions on Parallel and Distributed Systems
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Parallel Scientific Computing in C++ and MPI
Parallel Scientific Computing in C++ and MPI
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Mapping and Load-Balancing Iterative Computations
IEEE Transactions on Parallel and Distributed Systems
A Novel FDTD Application Featuring OpenMP-MPI Hybrid Parallelization
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
An Introduction to High Performance Fortran
Scientific Programming
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
This paper emphasises on load balancing issues associated with hybrid parallelisation of tiled algorithms onto SMP clusters. The intrinsic load imbalance in hybrid parallelisation derives from the fact that message passing libraries often provide limited multi-threading support, thus allowing only the master thread to perform inter-node message passing communication. In order to mitigate this effect, we propose a generic method for the application of load balancing on the coarse-grain hybrid model for the appropriate load distribution among the working threads. We investigate both static and dynamic load balancing, and experimentally evaluate three balancing variations against kernel benchmarks.