A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

Authors:
David F. Bacon;Jyh-Herng Chow;Dz-ching R. Ju;Kalyan Muthukumar;Vivek Sarkar
Affiliations:
Application Development Technology Institute, San Jose, CA;Application Development Technology Institute, San Jose, CA;Application Development Technology Institute, San Jose, CA;Application Development Technology Institute, San Jose, CA;Application Development Technology Institute, San Jose, CA
Venue:
CASCON First Decade High Impact Papers
Year:
2010

Citing 6
Cited 0

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Toward a Compile-Time Methodology for Reducing False Sharing and Communication Traffic in Shared Virtual Memory Systems

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
The Prime Memory System for Array Access

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has been observed that memory access performance can be improved by restructuring data declarations, using simple transformations such as array dimension padding and inter-array padding (array alignment) to reduce the number of misses in the cache and TLB (translation lookaside buffer). These transformations can be applied to both static and dynamic array variables. In this paper, we provide a padding algorithm for selecting appropriate padding amounts, which takes into account various cache and TLB effects collectively within a single framework. In addition to reducing the number of misses, we identify the importance of reducing the impact of cache miss jamming by spreading cache misses more uniformly across loop iterations. We translate undesirable cache and TLB behaviors into a set of constraints on padding amounts and propose a heuristic algorithm of polynomial time complexity to find the padding amounts to satisfy these constraints. The goal of the padding algorithm is to select padding amounts so that there are no set conflicts and no offset conflicts in the cache and TLB, for a given loop. In practice, this algorithm can efficiently find small padding amounts to satisfy these constraints.