Optimal bypass monitor for high performance last-level caches

Authors:
Lingda Li;Dong Tong;Zichao Xie;Junlin Lu;Xu Cheng
Affiliations:
Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China
Venue:
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Year:
2012

Citing 29
Cited 1

A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Utilizing reuse information in data cache management

ICS '98 Proceedings of the 12th international conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Run-Time Cache Bypassing

IEEE Transactions on Computers
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Pollution control caching

ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Emulating Optimal Replacement with a Shepherd Cache

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Counter-Based Cache Replacement and Bypassing Algorithms

IEEE Transactions on Computers
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Less reused filter: improving l2 cache performance via filtering less reused lines

Proceedings of the 23rd international conference on Supercomputing
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
Using dead blocks as a virtual victim cache

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sampling Dead Block Prediction for Last-Level Caches

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The future of microprocessors

Communications of the ACM
ARC: a self-tuning, low overhead replacement cache

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
CAR: clock with adaptive replacement

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Bypass and insertion algorithms for exclusive last-level caches

Proceedings of the 38th annual international symposium on Computer architecture
NUcache: An efficient multicore cache organization based on Next-Use distance

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
SHiP: signature-based hit predictor for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last-level cache, large amounts of blocks have reuse distances greater than the available cache capacity. Cache performance and efficiency can be improved if some subset of these distant reuse blocks can reside in the cache longer. The bypass technique is an effective and attractive solution that prevents the insertion of harmful blocks. Our analysis shows that bypass can contribute significant performance improvement, and the optimal bypass can achieve similar performance compared to OPT+B, which is the theoretical optimal replacement policy. Thus, we propose a bypass technique called Optimal Bypass Monitor (OBM), which makes bypass decisions by learning and predicting the behavior of the optimal bypass. OBM keeps a short global track of the incoming-victim block pairs. By detecting the first reuse block in each pair, the behavior of the optimal bypass on the track can be asserted to guide the bypass choice. Any existing replacement policy can be extended with OBM while requiring negligible design modification. Our experimental results show that using less than 1.5KB extra memory, OBM with the NRU replacement policy outperforms LRU by 9.7% and 8.9% for single-thread and multi-programmed workloads respectively. Compared with other state-of-the-art proposals such as DRRIP and SDBP, it achieves superior performance with less storage overhead.