Efficient Parallelization using Combined Loop and Data Transformations

Authors:
M. F. P. O'Boyle;P. M. W. Knijnenburg
Affiliations:
-;-
Venue:
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Year:
1999

Citing 0
Cited 3

Integrating loop and data transformations for global optimization

Journal of Parallel and Distributed Computing
Improving whole-program locality using intra-procedural and inter-procedural transformations

Journal of Parallel and Distributed Computing
Improving last level cache locality by integrating loop and data transformations

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper attempts to minimize parallelization overhead on distributed shared memory machines, such as the SGi Origin 2000, by the combination of non-singular loop and data transformations. We show that conflicting requirements on a loop transformation may be resolved by using a data transformation and vice-versa. We develop optimization criteria for locality, synchronization and communication and show that neither loop nor data transformations can be solely used for efficient parallelization. This leads to the development of a novel global optimization heuristic which is applied to 3 SPEC kernels where it is shown to outperform techniques solely based on loop or data transformations and to give significant improvement over an existing state-of-the- art commercial auto-parallelizer.