Automatic code generation for executing tiled nested loops onto parallel architectures

Authors:
Georgios Goumas;Maria Athanasaki;Nectarios Koziris
Affiliations:
National Technical University of Athens;National Technical University of Athens;National Technical University of Athens
Venue:
Proceedings of the 2002 ACM symposium on Applied computing
Year:
2002

Citing 10
Cited 2

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Non-unimodular transformations of nested loops

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Automating non-unimodular loop transformations for massive parallelism

Parallel Computing
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Beyond unimodular transformations

The Journal of Supercomputing
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Loop Transformation Using Nonunimodular Matrices

IEEE Transactions on Parallel and Distributed Systems
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Singular Loop Transformation Framework Based on Non-Singular Matrices

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing

Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Automatic parallel code generation for tiled nested loops

Proceedings of the 2004 ACM symposium on Applied computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach for the problem of generating tiled code for nested for-loops using a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multi-level memory hierarchies as well as to efficiently execute loops onto non-uniform memory access architectures. However, automatic code generation for tiled loops can be a very complex compiler work due to non-rectangular tile shapes and iteration space bounds. Our method considerably enhances previous work on rewriting tiled loops by considering parallelepiped tiles and arbitrary iteration space shapes. The complexity of code generation for tiling transformation is now reduced to the complexity of code generation for any linear transformation. Experimental results which compare all so far presented approaches, show that the proposed approach for generating tiled code is significantly accelerated.