The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
Exploring the Design Space of Future CMPs
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
CATS: cycle accurate transaction-driven simulation with multiple processor simulators
Proceedings of the conference on Design, automation and test in Europe
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Hi-index | 0.00 |
In this paper, we propose a novel on-chip L2 cache organization for chip multiprocessors (CMPs) with private L2 caches. The proposed approach, called reusability-aware cache sharing (RACS), combines the advantages of both a private L2 cache and a shared L2 cache. Since a private L2 cache organization has a short access latency, the RACS scheme employs a private L2 cache organization. However, when a cache block in a private L2 cache is selected for eviction, RACS first evaluates its reusability. If the block is likely to be reused in the near future, it may be saved to a peer L2 cache which has space available. In this way, the RACS scheme effectively simulates the larger capacity of a shared L2 cache. Simulation results show that RACS reduced the number of off-chip memory accesses by 24% compared to a pure private L2 cache organization on average for the SPLASH 2 multi-threaded benchmarks, and by 16% for multi-programmed benchmarks.