A data locality optimizing algorithm

Authors:
Monica S. Lam;Michael E. Wolf
Affiliations:
Stanford University, CA;Stanford University, CA
Venue:
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Year:
2004

Citing 18
Cited 8

Theory of linear and integer programming

Theory of linear and integer programming
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maximizing parallelism and minimizing synchronization with affine partitions

Parallel Computing - Special issues on languages and compilers for parallel computers
Organizing matrices and matrix operations for paged memory systems

Communications of the ACM
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Parallelization via Affine Transformations

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Improving the performance of virtual memory computers.

Improving the performance of virtual memory computers.
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

Practical Structure Layout Optimization and Advice

Proceedings of the International Symposium on Code Generation and Optimization
Whole-program optimization of global variable layout

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Prefetch throttling and data pinning for improving performance of shared caches

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A compiler framework for general memory layout optimizations targeting structures

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
McFLAT: a profile-based framework for MATLAB loop analysis and transformations

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Locality optimization of stencil applications using data dependency graphs

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Performance characteristics of global high-resolution ocean (MPIOM) and atmosphere (ECHAM6) models on large-scale multicore cluster

PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.