Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A practical algorithm for exact array dependence analysis
Communications of the ACM
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
On the complexity of loop fusion
Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Loop tiling for parallelism
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Enabling Loop Fusion and Tiling for Cache Performance by Fixing Fusion-Preventing Data Dependences
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Merging compositions of array skeletons in SAC
Parallel Computing - Algorithmic skeletons
With-Loop fusion for data locality and parallelism
IFL'05 Proceedings of the 17th international conference on Implementation and Application of Functional Languages
Hi-index | 0.00 |
Existing loop fusion algorithms fuse loop nests only when the dependences in the loop nests are not violated. This paper presents a new algorithm that is capable of fusing loop nests in the presence of fusion-preventing anti-dependences. We eliminate all these violated dependences by automatic array copying. In this work, such an aggressive loop fusion strategy is applied to a Jacobi program. The performance of such iterative methods is typically limited by the speed of the memory system. Fusing the two loop nests in the Jacobi program into one reduces data cache misses, and consequently, improves the performance results of both sequential and parallel versions of the Jacobi program, as validated by our experimental results on an HP AlphaServer SC45 supercomputer.