An accurate cost model for guiding data locality transformations

Authors:
Xavier Vera;Jaume Abella;Josep Llosa;Antonio González
Affiliations:
Mälardalens Högskola, Västerås, Sweden;Universitat Politècnica de Catalunya-Barcelona, Barcelona, Spain;Universitat Politècnica de Catalunya-Barcelona, Barcelona, Spain;Universitat Politècnica de Catalunya-Barcelona, Barcelona, Spain
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2005

Citing 31
Cited 1

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Numerical techniques for stochastic optimization

Numerical techniques for stochastic optimization
On the complexity of computing the volume of a polyhedron

SIAM Journal on Computing
Global optimization

Global optimization
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Nonlinear optimization: complexity issues

Nonlinear optimization: complexity issues
Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Locality optimizations for multi-level caches

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The hardness of cache conscious data placement

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Genetic Algorithms Plus Data Structures Equals Evolution Programs

Genetic Algorithms Plus Data Structures Equals Evolution Programs
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Tabu Search

Tabu Search
Itanium Processor Microarchitecture

IEEE Micro
A fast and accurate framework to analyze and optimize cache memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient solver for Cache Miss Equations

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Using Padding to Optimize Locality in Scientific Applications

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality.The performance of the memory hierarchy can be improved by means of data and loop transformations. Tiling is a loop transformation that aims at reducing capacity misses by shortening the reuse distance. Padding is a data layout transformation targeted to reduce conflict misses.This article presents an accurate cost model that describes misses across different hierarchy levels and considers the effects of other hardware components such as branch predictors. The cost model drives the application of tiling and padding transformations. We combine the cost model with a genetic algorithm to compute the tile and pad factors that enhance the program performance.To validate our strategy, we ran experiments for a set of benchmarks on a large set of modern architectures. Our results show that this scheme is useful to optimize programs' performance. When compared to previous approaches, we observe that with a reasonable compile-time overhead, our approach gives significant performance improvements for all studied kernels on all architectures.