The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
The Prime Memory System for Array Access
IEEE Transactions on Computers
Hi-index | 0.00 |
It has been observed that memory access performance can be improved by restructuring data declarations, using simple transformations such as array dimension padding and inter-array padding (array alignment) to reduce the number of misses in the cache and TLB (translation lookaside buffer). These transformations can be applied to both static and dynamic array variables. In this paper, we provide a padding algorithm for selecting appropriate padding amounts, which takes into account various cache and TLB effects collectively within a single framework. In addition to reducing the number of misses, we identify the importance of reducing the impact of cache miss jamming by spreading cache misses more uniformly across loop iterations. We translate undesirable cache and TLB behaviors into a set of constraints on padding amounts and propose a heuristic algorithm of polynomial time complexity to find the padding amounts to satisfy these constraints. The goal of the padding algorithm is to select padding amounts so that there are no set conflicts and no offset conflicts in the cache and TLB, for a given loop. In practice, this algorithm can efficiently find small padding amounts to satisfy these constraints.