A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Transactions on Computers
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Emulating Optimal Replacement with a Shepherd Cache
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Counter-Based Cache Replacement and Bypassing Algorithms
IEEE Transactions on Computers
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Using dead blocks as a virtual victim cache
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sampling Dead Block Prediction for Last-Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Communications of the ACM
ARC: a self-tuning, low overhead replacement cache
FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
CAR: clock with adaptive replacement
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Bypass and insertion algorithms for exclusive last-level caches
Proceedings of the 38th annual international symposium on Computer architecture
NUcache: An efficient multicore cache organization based on Next-Use distance
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
SHiP: signature-based hit predictor for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
In the last-level cache, large amounts of blocks have reuse distances greater than the available cache capacity. Cache performance and efficiency can be improved if some subset of these distant reuse blocks can reside in the cache longer. The bypass technique is an effective and attractive solution that prevents the insertion of harmful blocks. Our analysis shows that bypass can contribute significant performance improvement, and the optimal bypass can achieve similar performance compared to OPT+B, which is the theoretical optimal replacement policy. Thus, we propose a bypass technique called Optimal Bypass Monitor (OBM), which makes bypass decisions by learning and predicting the behavior of the optimal bypass. OBM keeps a short global track of the incoming-victim block pairs. By detecting the first reuse block in each pair, the behavior of the optimal bypass on the track can be asserted to guide the bypass choice. Any existing replacement policy can be extended with OBM while requiring negligible design modification. Our experimental results show that using less than 1.5KB extra memory, OBM with the NRU replacement policy outperforms LRU by 9.7% and 8.9% for single-thread and multi-programmed workloads respectively. Compared with other state-of-the-art proposals such as DRRIP and SDBP, it achieves superior performance with less storage overhead.