Improving data locality by chunking

Authors:
Cédric Bastoul;Paul Feautrier
Affiliations:
Laboratoire PRiSM, Université de Versailles Saint Quentin, Versailles Cedex, France;École Normale Supérieure de Lyon, Lyon, France
Venue:
CC'03 Proceedings of the 12th international conference on Compiler construction
Year:
2003

Citing 15
Cited 13

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Compiling for numa parallel machines

Compiling for numa parallel machines
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Transformations of nested loops with non-convex iteration spaces

Parallel Computing
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Precise Data Locality Optimization of Nested Loops

The Journal of Supercomputing
Structure of Computers and Computations

Structure of Computers and Computations
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing

Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Intermediately executed code is the key to find refactorings that improve temporal data locality

Proceedings of the 3rd conference on Computing frontiers
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations

Transactions on High-Performance Embedded Architectures and Compilers I
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Efficient code generation for automatic parallelization and optimization

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction
Optimizing I/O for big array analytics

Proceedings of the VLDB Endowment
A MapReduce-supported network structure for data centers

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cache memories were invented to decouple fast processors from slow memories. However, this decoupling is only partial, and many researchers have attempted to improve cache use by program optimization. Potential benefits are significant since both energy dissipation and performance highly depend on the traffic between memory levels. But modeling the traffic is difficult; this observation has led to the use of heuristic methods for steering program transformations. In this paper, we propose another approach: we simplify the cache model and we organize the target program in such a way that an asymptotic evaluation of the memory traffic is possible. This information is used by our optimization algorithm in order to find the best reordering of the program operations, at least in an asymptotic sense. Our method optimizes both temporal and spatial locality. It can be applied to any static control program with arbitrary dependences. The optimizer has been partially implemented and applied to non-trivial programs. We present experimental evidence that the amount of cache misses is drastically reduced with corresponding performance improvements.