A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Adaptive mode control: A static-power-efficient cache design
ACM Transactions on Embedded Computing Systems (TECS)
Automatically Mapping Code on an Intelligent Memory Architecture
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Tracking object life cycle for leakage energy optimization
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Proceedings of the 18th annual international conference on Supercomputing
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Evaluation techniques for storage hierarchies
IBM Systems Journal
Investigating cache energy and latency break-even points in high performance processors
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Investigating cache energy and latency break-even points in high performance processors
ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
Where replacement algorithms fail: a thorough analysis
Proceedings of the 7th ACM international conference on Computing frontiers
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
Replacement techniques for dynamic NUCA cache designs on CMPs
The Journal of Supercomputing
Hi-index | 0.00 |
Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, suggesting that alternative replacement algorithms can improve the performance of the cache. One main reason for this performance gap is that in the LRU replacement algorithm, a line is only evicted after it becomes the LRU line, long after its last access/touch, while unnecessarily occupying the cache space for a long time. This paper proposes a new approach to deal with the problem: counterbased L2 cache replacement . In this approach, each line in the L2 cache is augmented with an event counter that is incremented when an event of interest, such as a cache access to the same set, occurs. When the counter exceeds a threshold, the line "expires", and becomes evictable. When expired lines are evicted early from the cache, they make extra space for lines that may be more useful, reducing the number of capacity and conflict misses. Each line驴s threshold is unique and is dynamically learned and stored in a small 40-Kbyte counter prediction table. We propose two new replacement algorithms: Access Interval Predictor (AIP) and Live-time Predictor (LvP). AIP and LvP speed up 10 (out of 21) SPEC2000 benchmarks by up to 40% and 11% on average.