Parameterized tiled loops for free

Authors:
Lakshminarayanan Renganarayanan;DaeGon Kim;Sanjay Rajopadhye;Michelle Mills Strout
Affiliations:
Colorado State University, Fort Collins, CO;Colorado State University, Fort Collins, CO;Colorado State University, Fort Collins, CO;Colorado State University, Fort Collins, CO
Venue:
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Year:
2007

Citing 23
Cited 17

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis

Communications of the ACM
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
(Pen)-ultimate tiling?

Integration, the VLSI Journal
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallelizing compiler techniques based on linear inequalities

Parallelizing compiler techniques based on linear inequalities
Optimal orthogonal tiling of 2-D iterations

Journal of Parallel and Distributed Computing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Loop tiling for parallelism

Loop tiling for parallelism
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterative compilation

Embedded processor design challenges
Code generation for multiple mappings

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Implicit and explicit optimizations for stencil computations

Proceedings of the 2006 workshop on Memory system performance and correctness
Dynamic tiling for effective use of shared caches on multithreaded processors

International Journal of High Performance Computing and Networking
An efficient code generation technique for tiled iteration spaces

IEEE Transactions on Parallel and Distributed Systems

Multi-level tiling: M for the price of one

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Automatic creation of tile size selection models

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Parameterized tiling revisited

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A model for fusion and code motion in an automatic parallelizing compiler

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Computer Science - Research and Development
Efficient tiled loop generation: D-tiling

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Logical inference techniques for loop parallelization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Analytical bounds for optimal tile size selection

CC'12 Proceedings of the 21st international conference on Compiler Construction
A multi-objective auto-tuning framework for parallel codes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Patus for convenient high-performance stencils: evaluation in earthquake simulations

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Polyhedral parallel code generation for CUDA

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parameterized tiled loops-where the tile sizes are not fixed at compile time, but remain symbolic parameters until later--are quite useful for iterative compilers and "auto-tuners" that produce highly optimized libraries and codes. Tile size parameterization could also enable optimizations such as register tiling to become dynamic optimizations. Although it is easy to generate such loops for (hyper) rectangular iteration spaces tiled with (hyper) rectangular tiles, many important computations do not fall into this restricted domain. Parameterized tile code generation for the general case of convex iteration spaces being tiled by (hyper) rectangular tiles has in the past been solved with bounding box approaches or symbolic Fourier Motzkin approaches. However, both approaches have less than ideal code generation efficiency and resulting code quality. We present the theoretical foundations, implementation, and experimental validation of a simple, unified technique for generating parameterized tiled code. Our code generation efficiency is comparable to all existing code generation techniques including those for fixed tile sizes, and the resulting code is as efficient as, if not more than, all previous techniques. Thus the technique provides parameterized tiled loops for free! Our "one-size-fits-all" solution, which is available as open source software can be adapted for use in production compilers.