Improving the parallelism of iterative methods by aggressive loop fusion

Authors:
Jingling Xue;Minyi Guo;Daming Wei
Affiliations:
Programming Languages and Compilers Group, School of Computer Science and Engineering, University of New South Wales, Sydney, Australia 2052;School of Computer Science and Engineering, The University of Aizu, Fukushima, Japan 965-8580;School of Computer Science and Engineering, The University of Aizu, Fukushima, Japan 965-8580
Venue:
The Journal of Supercomputing
Year:
2008

Citing 12
Cited 2

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Loop distribution with arbitrary control flow

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A practical algorithm for exact array dependence analysis

Communications of the ACM
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
On the complexity of loop fusion

Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Loop tiling for parallelism

Loop tiling for parallelism
Loop Shifting for Loop Compaction

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Enabling Loop Fusion and Tiling for Cache Performance by Fixing Fusion-Preventing Data Dependences

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing

Optimal loop parallelization for maximizing iteration-level parallelism

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
An approximate method for filtering out data dependencies with a sufficiently large distance between memory references

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, loop nests are fused only when the data dependences in the loop nests are not violated. This paper presents a new loop fusion algorithm that is capable of fusing loop nests in the presence of fusion-preventing anti-dependences. All the violated anti-dependences are removed by automatic array copying. As a case study, this aggressive loop fusion strategy is applied to a Jacobi solver. The performance of iterative methods is typically limited by the speed of the memory system. Fusing the two loop nests in the Jacobi solver into one reduces data cache misses, and consequently, improves the performance results of both sequential and parallel versions of the Jacobi program, as validated by our experimental results on an HP AlphaServer SC45 supercomputer.