The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
IEEE Transactions on Computers
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Modeling Live and Dead Lines in Cache Memory Systems
IEEE Transactions on Computers
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
TCP: Tag Correlating Prefetchers
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Self-correcting LRU replacement policies
Proceedings of the 1st conference on Computing frontiers
Memory coherence activity prediction in commercial workloads
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Cooperative Caching with Keep-Me and Evict-Me
INTERACT '05 Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures
IATAC: a smart predictor to turn-off L2 cache lines
ACM Transactions on Architecture and Code Optimization (TACO)
Hardware techniques to reduce communication costs in multiprocessors
Hardware techniques to reduce communication costs in multiprocessors
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Counter-Based Cache Replacement and Bypassing Algorithms
IEEE Transactions on Computers
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Using dead blocks as a virtual victim cache
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sampling Dead Block Prediction for Last-Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
STEM: Spatiotemporal Management of Capacity for Intra-core Last Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Management policies analysis for multi-core shared caches
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Bypass and insertion algorithms for exclusive last-level caches
Proceedings of the 38th annual international symposium on Computer architecture
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches
ACM Transactions on Architecture and Code Optimization (TACO)
SHiP: signature-based hit predictor for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Unified memory optimizing architecture: memory subsystem control with a unified predictor
Proceedings of the 26th ACM international conference on Supercomputing
Improving writeback efficiency with decoupled last-write prediction
Proceedings of the 39th Annual International Symposium on Computer Architecture
Combining recency of information with selective random and a victim cache in last-level caches
ACM Transactions on Architecture and Code Optimization (TACO)
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Optimal bypass monitor for high performance last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Exploiting reuse locality on inclusive shared last-level caches
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Improving Cache Management Policies Using Dynamic Reuse Distances
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
Managing shared last-level cache in a heterogeneous multicore processor
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A Buffered Dual-Access-Mode Scheme Designed for Low-Power Highly-Associative Caches
International Journal of Embedded and Real-Time Communication Systems
Insertion and promotion for tree-based PseudoLRU last-level caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
An efficient compiler framework for cache bypassing on GPUs
Proceedings of the International Conference on Computer-Aided Design
An effectiveness-based adaptive cache replacement policy
Microprocessors & Microsystems
Hi-index | 0.01 |
Data caches in general-purpose microprocessors often contain mostly dead blocks and are thus used inefficiently. To improve cache efficiency, dead blocks should be identified and evicted early. Prior schemes predict the death of a block immediately after it is accessed; however, these schemes yield lower prediction accuracy and coverage. Instead, we find that predicting the death of a block when it just moves out of the MRU position gives the best tradeoff between timeliness and prediction accuracy/coverage. Furthermore, the individual reference history of a block in the L1 cache can be irregular because of data/control dependence. This paper proposes a new class of dead-block predictors that predict dead blocks based on bursts of accesses to a cache block. A cache burst begins when a block becomes MRU and ends when it becomes non-MRU. Cache bursts are more predictable than individual references because they hide the irregularity of individual references. When used at the L1 cache, the best burst-based predictor can identify 96% of the dead blocks with a 96% accuracy. With the improved dead-block predictors, we evaluate three ways to increase cache efficiency by eliminating dead blocks early: replacement optimization, bypassing, and prefetching. The most effective approach, prefetching into dead blocks, increases the average L1 efficiency from 8% to 17% and the L2 efficiency from 17% to 27%. This increased cache efficiency translates into higher overall performance: prefetching into dead blocks outperforms the same prefetch scheme without dead-block prediction by 12% at the L1 and by 13% at the L2.