Removing impediments to loop fusion through code transformations

Authors:
Bob Blainey;Christopher Barton;José Nelson Amaral
Affiliations:
IBM Toronto Software Laboratory, Toronto, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada
Venue:
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Year:
2002

Citing 12
Cited 1

Loop distribution with arbitrary control flow

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Loop distribution with multiple exits

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
A Survey of Parallel Machine Organization and Programming

ACM Computing Surveys (CSUR)
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Adaptive loop transformations for scientific programs

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Parallelism exposure and exploitation in programs

Parallelism exposure and exploitation in programs

Should potential loop optimizations influence inlining decisions?

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loop fusion is a common optimization technique that takes several loops and combines them into a single large loop. Most of the existing work on loop fusion concentrates on the heuristics required to optimize an objective function, such as data reuse or creation of instruction level parallelism opportunities. Often, however, the code provided to a compiler has only small sets of loops that are control flow equivalent, normalized, have the same iteration count, are adjacent, and have no fusion-preventing dependences. This paper focuses on code transformations that create more opportunities for loop fusion in the IBM®XL compiler suite that generates code for the IBM family of PowerPC®processors. In this compiler an objective function is used at the loop distributor to decide which portions of a loop should remain in the same loop nest and which portions should be redistributed. Our algorithm focuses on eliminating conditions that prevent loop fusion. By generating maximal fusion our algorithm increases the scope of later transformations. We tested our improved code generator in an IBM pSeriesTM690 machine equipped with a POWER4TMprocessor using the SPEC CPU2000 benchmark suite. Our improvements to loop fusion resulted in three times as many loops fused in a subset of CFP2000 benchmarks, and four times as many for a subset of CINT2000 benchmarks.