Tiling imperfectly-nested loop nests

Authors:
Nawaaz Ahmed;Nikolay Mateev;Keshav Pingali
Affiliations:
Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY
Venue:
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Year:
2000

Citing 23
Cited 19

Technical note—engineering and scientific subroutine library release 3 for IBM ES/3090 vector multiprocessors

IBM Systems Journal
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
LAPACK's user's guide

LAPACK's user's guide
A practical algorithm for exact array dependence analysis

Communications of the ACM
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
A singular loop transformation framework based on non-singular matrices

International Journal of Parallel Programming
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine partitions

Parallel Computing - Special issues on languages and compilers for parallel computers
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An experimental evaluation of tiling and shackling for memory hierarchy management

ICS '99 Proceedings of the 13th international conference on Supercomputing
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Proceedings of the 14th international conference on Supercomputing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Code generation for multiple mappings

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops

Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Increasing temporal locality with skewing and recursive blocking

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions

Journal of VLSI Signal Processing Systems
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design--Implementation of Finite Interval Constant Modulus Algorithm

Journal of VLSI Signal Processing Systems
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Buffer and Register Allocation for Memory Space Optimization

Journal of VLSI Signal Processing Systems
Dynamic tiling for effective use of shared caches on multithreaded processors

International Journal of High Performance Computing and Networking
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
On the interaction of tiling and automatic parallelization

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
An extensible global address space framework with decoupled task and data abstractions

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Efficient search-space pruning for integrated fusion and tiling transformations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Efficient tiled loop generation: D-tiling

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
An approach for semiautomatic locality optimizations using OpenMP

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Hierarchical overlapped tiling

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tiling is one of the more important transformations for enhancing loca lity of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop nests in which all assignment statements are contained in the innermost loop) is well understood. In practice, many loop nests are imperfectly nested, so existing compilers use heuristics to try to find a sequence of transformations that convert such loop nests into perfectly-nested ones, but these heuristics do not always succeed. In this paper, we propose a novel approach to tiling imperfectly-nested loop nests. The key idea is to embed the iteration space of every statement in the imperfectly-nested loop nest into a special space called the product space which is tiled to produce the final code. We evaluate the effectiveness of this approach for dense numrical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. No other single approach in the literature can tile all these codes automatically.