Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic memory layout transformations to optimize spatial locality in parameterized loop nests
ACM SIGARCH Computer Architecture News - Special issue on interaction between compilers and computer architectures
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
A data alignment technique for improving cache performance
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
An accurate cost model for guiding data locality transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
Program locality exploitation is a key issue to reduce the execution time of scientific applications, so as many techniques have been designed for locality optimization. This paper presents new compiler algorithms based on array padding that optimize program locality either locally (at loop level) or globally (the whole program). We first introduce a formal cache model that is used to analyze how all cache levels are filled up when arrays inside nested loops are referenced. We further study the relation between the model parameters and the data memory layout of the arrays, and define how to pad those arrays in order to optimize cache occupation at all levels. Experimental evaluation on some numerical benchmarks shows the benefits of our approach.