Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Access region cache with register guided memory reference partitioning
Journal of Systems Architecture: the EUROMICRO Journal
Dynamic partition of memory reference instructions – a register guided approach
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Abstract: A low-latency and high-bandwidth Level 1 data cache is crucial for achieving high performance in future superscalar microprocessors. The Parallel Cachelets (PC) proposed in this paper provide bandwidth close to that of a multi-ported cache with implementation efficiency close to that of a multi-banked cache. In the PC scheme, the traditional Level 1 data cache is replaced by a set of parallel single-ported independent caches, or cachelets. Similar to the interleaved multi-banked design, the cachelets can be concurrently accessed to provide bandwidth. However, instead of mapping data elements to the banks in an interleaved fashion based on address bits, they are dynamically assigned to cachelets based on the pattern of concurrent accesses, thus many bank conflicts can be eliminated. Furthermore, the PC scheme exhibits the attribute of implicit set associativity that allows it to outperform a direct-mapped multi-ported cache for some benchmarks. The PC scheme outperforms the multi-banked scheme by an average of 6% (IPC) across a set of SPEC95 benchmarks, and comes very close to matching the performance of the multi-ported scheme. When cache access latency is taken into account the PC scheme even outperforms the multi-ported scheme by 6.4%.