Iterative collective loop fusion

Authors:
T. J. Ashby;M. F. P. O'Boyle
Affiliations:
Institute for Computer Systems Architecture, University of Edinburgh, Scotland, UK;Institute for Computer Systems Architecture, University of Edinburgh, Scotland, UK
Venue:
CC'06 Proceedings of the 15th international conference on Compiler Construction
Year:
2006

Citing 11
Cited 1

Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Iterative methods for solving linear systems

Iterative methods for solving linear systems
The implementation and evaluation of fusion and contraction in array languages

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
On the Complexity of Loop Fusion

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Iterative compilation for energy reduction

Journal of Embedded Computing - Cache exploitation in embedded systems

The impact of global communication latency at extreme scales on Krylov methods

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Naive code generation from high-level languages that encourage modularity can give rise to large numbers of simple loops for array-based programs. Collective loop fusion and array contraction can be used on such codes to improve temporal locality and performance. The problem is typically formalised using a loop dependence graph (LDG), with solutions denoted by fusion partitions. Much previous work has concentrated on approaches to the abstract formulation. We present our technique called iterative collective loop fusion based on empirically evaluating different transformations, and show how it can provide speedups over existing approaches of up to 1.38. We also give results showing that applying such techniques to high-level languages can provide speedups of up to 2.45 over the original code, and outperforms an equivalent code in Fortran.