Parameterized loop tiling

Authors:
Lakshminarayanan Renganarayanan;Daegon Kim;Michelle Mills Strout;Sanjay Rajopadhye
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;CORESPEQ, Inc., Milpitas, CA;Colorado State University, Fort Collins, CO;Colorado State University, Fort Collins, CO
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2012

Citing 30
Cited 0

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A theory on extending algorithms for parametric problems

Mathematics of Operations Research
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis

Communications of the ACM
Dines-Fourier-Motzkin quantifier elimination and an application of corresponding transfer principles over ordered fields

Mathematical Programming: Series A and B
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
(Pen)-ultimate tiling?

Integration, the VLSI Journal
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallelizing compiler techniques based on linear inequalities

Parallelizing compiler techniques based on linear inequalities
Optimal orthogonal tiling of 2-D iterations

Journal of Parallel and Distributed Computing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Loop tiling for parallelism

Loop tiling for parallelism
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Iterative compilation

Embedded processor design challenges
Code generation for multiple mappings

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Implicit and explicit optimizations for stencil computations

Proceedings of the 2006 workshop on Memory system performance and correctness
Dynamic tiling for effective use of shared caches on multithreaded processors

International Journal of High Performance Computing and Networking
Multi-level tiling: M for the price of one

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An efficient code generation technique for tiled iteration spaces

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loop tiling is a widely used program optimization that improves data locality and enables coarse-grained parallelism. Parameterized tiled loops, where the tile sizes remain symbolic parameters until runtime, are quite useful for iterative compilers and autotuners that produce highly optimized libraries and codes. Although it is easy to generate such loops for (hyper-) rectangular iteration spaces tiled with (hyper-) rectangular tiles, many important computations do not fall into this restricted domain. In the past, parameterized tiled code generation for the general case of convex iteration spaces being tiled by (hyper-) rectangular tiles has been solved with bounding box approaches or with sophisticated and expensive machinery. We present a novel formulation of the parameterized tiled loop generation problem using a polyhedral set called the outset. By reducing the problem of parameterized tiled code generation to that of generating standard loops and simple postprocessing of these loops, the outset method achieves a code generation efficiency that is comparable to existing code generation techniques, including those for fixed tile sizes. We compare the performance of our technique with several other tiled loop generation methods on kernels from BLAS3 and scientific computations. The simplicity of our solution makes it well suited for use in production compilers—in particular, the IBM XL compiler uses the inset-based technique introduced in this article for register tiling. We also provide a complete coverage of parameterized tiling of perfect loop nests by describing three related techniques: (i) a scheme for separating full and partial tiles; (ii) a scheme for generating tiled loops directly from the abstract syntax tree representation of loops; (iii) a formal characterization of parameterized loop tiling using bilinear forms and a Symbolic Fourier-Motzkin Elimination (SFME)-based parameterized tiled loop generation method.