Improving Locality in the Parallelization of Doacross Loops (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A GSA-based compiler infrastructure to extract parallelism from complex loops
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Evaluation of Hierarchical Mesh Reorderings
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
An inspector-executor algorithm for irregular assignment parallelization
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed. In this paper we propose run-time strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, theoverall program speedup. Two strategies are proposed: one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.