Two Fast and High-Associativity Cache Schemes

  • Authors:
  • Chenxi Zhang;Xiaodong Zhang;Yong Yan

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Micro
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

A traditional implementation of the set-associative cache has the disadvantage of longer access cycle times than that of a direct-mapped cache. Several methods have been proposed for implementing associativity in non-traditional ways. However, most of them have only achieved an associativity of two. The others with higher associativity still present longer access cycle times or suffer from larger average access times. In this paper, we first systematically address implementation issues of associativity and evaluate existing implementation methods. We then propose two schemes for implementing higher associativity: the Sequential Multi-Column Cache, which is an extension of the Column Associative Cache, and the Parallel Multi-Column Cache. In order to achieve the same access cycle time as that of a direct-mapped cache, data memory in the cache is organized into one bank in both schemes. We use the multiple MRU block technique to increase the first hit ratio, thus reducing the average access time. While the Parallel Multi-Column Cache performs the tag checking in parallel, the Sequential Multi-Column Cache sequentially searches through places in a set, and uses index information to filter out unnecessary probes. In the case of an associativity of 4, they both achieve the low miss rate of a 4-way set-associative cache. Our simulation results using ATUM traces show that both schemes can effectively reduce the average access time. They have average improvements of 9.8% and 10.8% in average access time over a direct-mapped cache, respectively, for a cache size of 4K bytes and a miss penalty of 20 cycles, in which case the average improvement of the Column Associative Cache is only 4.3%. The improvement of the Sequential Multi-Column Cache in average access time reaches 22.4% when the associativity is 8 and the miss penalty increases to 100 cycles. The two schemes are effective for both small and large caches (1K bytes to 128K bytes).