POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A theory on extending algorithms for parametric problems
Mathematics of Operations Research
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis
Communications of the ACM
Mathematical Programming: Series A and B
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Integration, the VLSI Journal
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallelizing compiler techniques based on linear inequalities
Parallelizing compiler techniques based on linear inequalities
Optimal orthogonal tiling of 2-D iterations
Journal of Parallel and Distributed Computing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
Loop tiling for parallelism
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Embedded processor design challenges
Code generation for multiple mappings
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Automatic Blocking of Nested Loops
Automatic Blocking of Nested Loops
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Implicit and explicit optimizations for stencil computations
Proceedings of the 2006 workshop on Memory system performance and correctness
Dynamic tiling for effective use of shared caches on multithreaded processors
International Journal of High Performance Computing and Networking
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Parametric multi-level tiling of imperfectly nested loops
Proceedings of the 23rd international conference on Supercomputing
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An efficient code generation technique for tiled iteration spaces
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Loop tiling is a widely used program optimization that improves data locality and enables coarse-grained parallelism. Parameterized tiled loops, where the tile sizes remain symbolic parameters until runtime, are quite useful for iterative compilers and autotuners that produce highly optimized libraries and codes. Although it is easy to generate such loops for (hyper-) rectangular iteration spaces tiled with (hyper-) rectangular tiles, many important computations do not fall into this restricted domain. In the past, parameterized tiled code generation for the general case of convex iteration spaces being tiled by (hyper-) rectangular tiles has been solved with bounding box approaches or with sophisticated and expensive machinery. We present a novel formulation of the parameterized tiled loop generation problem using a polyhedral set called the outset. By reducing the problem of parameterized tiled code generation to that of generating standard loops and simple postprocessing of these loops, the outset method achieves a code generation efficiency that is comparable to existing code generation techniques, including those for fixed tile sizes. We compare the performance of our technique with several other tiled loop generation methods on kernels from BLAS3 and scientific computations. The simplicity of our solution makes it well suited for use in production compilers—in particular, the IBM XL compiler uses the inset-based technique introduced in this article for register tiling. We also provide a complete coverage of parameterized tiling of perfect loop nests by describing three related techniques: (i) a scheme for separating full and partial tiles; (ii) a scheme for generating tiled loops directly from the abstract syntax tree representation of loops; (iii) a formal characterization of parameterized loop tiling using bilinear forms and a Symbolic Fourier-Motzkin Elimination (SFME)-based parameterized tiled loop generation method.