An inter-reference gap model for temporal locality in program behavior
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
EELRU: simple and effective adaptive page replacement
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
IEEE Transactions on Computers
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the 18th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
SARC: sequential prefetching in adaptive replacement cache
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Scavenger: A New Last Level Cache Architecture with Global Block Priority
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Emulating Optimal Replacement with a Shepherd Cache
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Counter-Based Cache Replacement and Bypassing Algorithms
IEEE Transactions on Computers
Adaptive insertion policies for managing shared caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Accelerating multicore reuse distance analysis with sampling and parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sampling Dead Block Prediction for Last-Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Vantage: scalable and efficient fine-grain cache partitioning
Proceedings of the 38th annual international symposium on Computer architecture
Bypass and insertion algorithms for exclusive last-level caches
Proceedings of the 38th annual international symposium on Computer architecture
NUcache: An efficient multicore cache organization based on Next-Use distance
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
CRUISE: cache replacement and utility-aware scheduling
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
SHiP: signature-based hit predictor for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Insertion and promotion for tree-based PseudoLRU last-level caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Cache management policies such as replacement, bypass, or shared cache partitioning have been relying on data reuse behavior to predict the future. This paper proposes a new way to use dynamic reuse distances to further improve such policies. A new replacement policy is proposed which prevents replacing a cache line until a certain number of accesses to its cache set, called a Protecting Distance (PD). The policy protects a cache line long enough for it to be reused, but not beyond that to avoid cache pollution. This can be combined with a bypass mechanism that also relies on dynamic reuse analysis to bypass lines with less expected reuse. A miss fetch is bypassed if there are no unprotected lines. A hit rate model based on dynamic reuse history is proposed and the PD that maximizes the hit rate is dynamically computed. The PD is recomputed periodically to track a program's memory access behavior and phases. Next, a new multi-core cache partitioning policy is proposed using the concept of protection. It manages lifetimes of lines from different cores (threads) in such a way that the overall hit rate is maximized. The average per-thread lifetime is reduced by decreasing the thread's PD. The single-core PD-based replacement policy with bypass achieves an average speedup of 4.2% over the DIP policy, while the average speedups over DIP are 1.5% for dynamic RRIP (DRRIP) and 1.6% for sampling dead-block prediction (SDP). The 16-core PD-based partitioning policy improves the average weighted IPC by 5.2%, throughput by 6.4% and fairness by 9.9% over thread-aware DRRIP (TA-DRRIP). The required hardware is evaluated and the overhead is shown to be manageable.