A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

  • Authors:
  • David F. Bacon;Jyh-Herng Chow;Dz-ching R. Ju;Kalyan Muthukumar;Vivek Sarkar

  • Affiliations:
  • Application Development Technology Institute, IBM Software Solutions Division, 555 Bailey Avenue, San Jose, CA;Application Development Technology Institute, IBM Software Solutions Division, 555 Bailey Avenue, San Jose, CA;Application Development Technology Institute, IBM Software Solutions Division, 555 Bailey Avenue, San Jose, CA;Application Development Technology Institute, IBM Software Solutions Division, 555 Bailey Avenue, San Jose, CA;Application Development Technology Institute, IBM Software Solutions Division, 555 Bailey Avenue, San Jose, CA

  • Venue:
  • CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 1994

Quantified Score

Hi-index 0.01

Visualization

Abstract

It has been observed that memory access performance can be improved by restructuring data declarations, using simple transformations such as array dimension padding and inter-array padding (array alignment) to reduce the number of misses in the cache and TLB (translation lookaside buffer). These transformations can be applied to both static and dynamic array variables. In this paper, we provide a padding algorithm for selecting appropriate padding amounts, which takes into account various cache and TLB effects collectively within a single framework. In addition to reducing the number of misses, we identify the importance of reducing the impact of cache miss jamming by spreading cache misses more uniformly across loop iterations.We translate undesirable cache and TLB behaviors into a set of constraints on padding amounts and propose a heuristic algorithm of polynomial time complexity to find the padding amounts to satisfy these constraints. The goal of the padding algorithm is to select padding amounts so that there are no set conflicts and no offset conflicts in the cache and TLB, for a given loop. In practice, this algorithm can efficiently find small padding amounts to satisfy these constraints.