Less reused filter: improving l2 cache performance via filtering less reused lines

Authors:
Lingxiang Xiang;Tianzhou Chen;Qingsong Shi;Wei Hu
Affiliations:
Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 28
Cited 6

The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Active Management of Data Caches by Exploiting Reuse Information

IEEE Transactions on Computers
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Run-Time Cache Bypassing

IEEE Transactions on Computers
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies

IEEE Transactions on Computers
Compiler managed micro-cache bypassing for high performance EPIC processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The EELRU adaptive replacement algorithm

Performance Evaluation
Analysis of cache replacement-algorithms

Analysis of cache replacement-algorithms
Inter-reference gap distribution replacement: an improved replacement algorithm for set-associative caches

Proceedings of the 18th annual international conference on Supercomputing
IATAC: a smart predictor to turn-off L2 cache lines

ACM Transactions on Architecture and Code Optimization (TACO)
The V-Way Cache: Demand Based Associativity via Global Replacement

Proceedings of the 32nd annual international symposium on Computer Architecture
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Counter-Based Cache Replacement Algorithms

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches

MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
L1 Cache Filtering Through Random Selection of Memory References

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Scavenger: A New Last Level Cache Architecture with Global Block Priority

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Emulating Optimal Replacement with a Shepherd Cache

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

On the theory and potential of LRU-MRU collaborative cache management

Proceedings of the international symposium on Memory management
A generalized theory of collaborative caching

Proceedings of the 2012 international symposium on Memory Management
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures

Proceedings of the 26th ACM international conference on Supercomputing
Unified memory optimizing architecture: memory subsystem control with a unified predictor

Proceedings of the 26th ACM international conference on Supercomputing
Optimal bypass monitor for high performance last-level caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Pacman: program-assisted cache management

Proceedings of the 2013 international symposium on memory management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The L2 cache is commonly managed using LRU policy. For workloads that have a working set larger than L2 cache, LRU behaves poorly, resulting in a great number of less reused lines that are never reused or reused for few times. In this case, the cache performance can be improved through retaining a portion of working set in cache for a period long enough. Previous schemes approach this by bypassing never reused lines. Nevertheless, severely constrained by the number of never reused lines, sometimes they deliver no benefit due to the lack of never reused lines. This paper proposes a new filtering mechanism that filters out the less reused lines rather than just never reused lines. The extended scope of bypassing provides more opportunities to fit the working set into cache. This paper also proposes a Less Reused Filter (LRF), a separate structure that precedes L2 cache, to implement the above mechanism. LRF employs a reuse frequency predictor to accurately identify the less reused lines from incoming lines. Meanwhile, based on our observation that most less reused lines have a short life span, LRF places the filtered lines into a small filter buffer to fully utilize them, avoiding extra misses. Our evaluation, for 24 SPEC 2000 benchmarks, shows that augmenting a 512KB LRU-managed L2 cache with a LRF having 32KB filter buffer reduces the average MPKI by 27.5%, narrowing the gap between LRU and OPT by 74.4%.