The Effect of Process Topology and Load Balancing on Parallel Programming Models for SMP Clusters and Iterative Algorithms

Authors:
Nikolaos Drosinos;Nectarios Koziris
Affiliations:
National Technical University of Athens, School of Electrical and Computer Engineering, Computing Systems Laboratory, Athens, Greece GR15780;National Technical University of Athens, School of Electrical and Computer Engineering, Computing Systems Laboratory, Athens, Greece GR15780
Venue:
The Journal of Supercomputing
Year:
2006

Citing 14
Cited 0

Reducing data communication overhead for DOACROSS loop nests

ICS '94 Proceedings of the 8th international conference on Supercomputing
A Survey of Recoverable Distributed Shared Virtual Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Optimal scheduling for UET/VET-UCT generalized n-dimensional grid task graphs

Journal of Parallel and Distributed Computing
OpenMP for networks of SMPs

Journal of Parallel and Distributed Computing
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A multithreaded message passing interface (MPI) architecture: performance and program issues

Journal of Parallel and Distributed Computing
Terascale spectral element dynamical core for atmospheric general circulation models

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Compiling Tiled Iteration Spaces for Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Parallel Scientific Computing in C++ and MPI

Parallel Scientific Computing in C++ and MPI
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Mapping and Load-Balancing Iterative Computations

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.03

Visualization

Abstract

This article focuses on the effect of both process topology and load balancing on various programming models for SMP clusters and iterative algorithms. More specifically, we consider nested loop algorithms with constant flow dependencies, that can be parallelized on SMP clusters with the aid of the tiling transformation. We investigate three parallel programming models, namely a popular message passing monolithic parallel implementation, as well as two hybrid ones, that employ both message passing and multi-threading. We conclude that the selection of an appropriate mapping topology for the mesh of processes has a significant effect on the overall performance, and provide an algorithm for the specification of such an efficient topology according to the iteration space and data dependencies of the algorithm. We also propose static load balancing techniques for the computation distribution between threads, that diminish the disadvantage of the master thread assuming all inter-process communication due to limitations often imposed by the message passing library. Both improvements are implemented as compile-time optimizations and are further experimentally evaluated. An overall comparison of the above parallel programming styles on SMP clusters based on micro-kernel experimental evaluation is further provided, as well.