Simple penalty-sensitive replacement policies for caches

Authors:
Jaeheon Jeong;Per Stenström;Michel Dubois
Affiliations:
Intel, Hillsboro, OR;Chalmers University of Technology Gothenburg, SWEDEN;University of Southern California, Los Angeles, CA
Venue:
Proceedings of the 3rd conference on Computing frontiers
Year:
2006

Citing 23
Cited 2

Cache Operations by MRU Change

IEEE Transactions on Computers
High-performance computer architecture (2nd ed.)

High-performance computer architecture (2nd ed.)
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Data prefetching for high-performance processors

Data prefetching for high-performance processors
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Run-time spatial locality detection and optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Optimal replacements in caches with two miss costs

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Hardware identification of cache conflict misses

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The Effect of using State-Based Priority Information in a Shared-Memory Multiprocessor Cache Replacement Policy

ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Design Issues and Tradeoffs for Write Buffers

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Improving CC-NUMA Performance Using Instruction-Based Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Cost-Sensitive Cache Replacement Algorithms

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cost-sensitive cache replacement algorithms

Cost-sensitive cache replacement algorithms

Cache Replacement Algorithms with Nonuniform Miss Costs

IEEE Transactions on Computers
Instruction-based reuse-distance prediction for effective cache management

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classic cache replacement policies assume that miss costs are uniform. However, the correlation between miss rate and cache performance is not as straightforward as it used to be. Ultimately, the true cost measure of a miss should be the penalty, i.e. the actual processing bandwidth lost because of the miss. It is known that, contrary to loads, the penalty of stores is mostly hidden in modern processors. To take advantage of this observation, we propose simple schemes to replace load misses by store misses. We extend classic replacement algorithms such as LRU (Least Recently Used) and PLRU (Partial LRU) to reduce the aggregate miss penalty instead of the miss count.One key issue is to predict the next access type to a block, so that higher replacement priority is given to blocks that will be accessed next with a store. We introduce and evaluate various prediction schemes based on instructions, and broadly inspired from branch predictors. To guide the design we run extensive trace-driven simulations on eight Spec95 benchmarks with a wide range of cache configurations and observe that our simple penalty-sensitive policies yield positive load miss improvements over classic algorithms across most the benchmarks and cache configurations. In some cases the improvements are very large.