Parallel Cachelets

  • Authors:
  • Affiliations:
  • Venue:
  • ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: A low-latency and high-bandwidth Level 1 data cache is crucial for achieving high performance in future superscalar microprocessors. The Parallel Cachelets (PC) proposed in this paper provide bandwidth close to that of a multi-ported cache with implementation efficiency close to that of a multi-banked cache. In the PC scheme, the traditional Level 1 data cache is replaced by a set of parallel single-ported independent caches, or cachelets. Similar to the interleaved multi-banked design, the cachelets can be concurrently accessed to provide bandwidth. However, instead of mapping data elements to the banks in an interleaved fashion based on address bits, they are dynamically assigned to cachelets based on the pattern of concurrent accesses, thus many bank conflicts can be eliminated. Furthermore, the PC scheme exhibits the attribute of implicit set associativity that allows it to outperform a direct-mapped multi-ported cache for some benchmarks. The PC scheme outperforms the multi-banked scheme by an average of 6% (IPC) across a set of SPEC95 benchmarks, and comes very close to matching the performance of the multi-ported scheme. When cache access latency is taken into account the PC scheme even outperforms the multi-ported scheme by 6.4%.