Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
Where replacement algorithms fail: a thorough analysis
Proceedings of the 7th ACM international conference on Computing frontiers
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
SHiP: signature-based hit predictor for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Optimal bypass monitor for high performance last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Improving Cache Management Policies Using Dynamic Reuse Distances
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
The inherent temporal locality in memory accesses is filtered out by the L1 cache. As a consequence, an L2 cache with LRU replacement incurs significantly higher misses than the optimal replacement policy (OPT). We propose to narrow this gap through a novel replacement strategy that mimics the replacement decisions of OPT. The L2 cache is logically divided into two components, a Shepherd Cache (SC) with a simple FIFO replacement and a Main Cache (MC) with an emulation of optimal replacement. The SC plays the dual role of caching lines and guiding the replacement decisions in MC. Our pro- posed organization can cover 40% of the gap between OPT and LRU for a 2MB cache resulting in 7% overall speedup. Comparison with the dynamic insertion policy, a victim buffer, a V-Way cache and an LRU based fully associative cache demonstrates that our scheme performs better than all these strategies.