Enhancing last-level cache performance by block bypassing and early miss determination

Authors:
Haakon Dybdahl;Per Stenström
Affiliations:
Dept. of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Dept. of Computer Engineering, Dept. of Computer Engineering, Chalmers University of Technology, Goteborg, Sweden
Venue:
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Year:
2006

Citing 12
Cited 4

Cache replacement with dynamic exclusion

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient simulation of caches under optimal replacement with applications to miss characterization

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Utilizing reuse information in data cache management

ICS '98 Proceedings of the 12th international conference on Supercomputing
Run-Time Cache Bypassing

IEEE Transactions on Computers
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Just Say No: Benefits of Early Cache Miss Determination

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and performance evaluation of a cache assist to implement selective caching

ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Self-correcting LRU replacement policies

Proceedings of the 1st conference on Computing frontiers
Static Identification of Delinquent Loads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal

An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches

ACM SIGARCH Computer Architecture News
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Code-based cache partitioning for improving hardware cache performance

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Exploiting single-usage for effective memory management

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

While bypassing algorithms have been applied to the first-level cache, we study for the first time their effectiveness for the last-level caches for which miss penalties are significantly higher and where algorithm complexity is not constrained by the speed of the pipeline. Our algorithm monitors the reuse behavior of blocks that are touched by delinquent loads and re-classify them on-the-fly. Blocks classified as bypassed are only installed in the level-1 cache. We leverage the algorithm to early send out a miss request for loads expected to request blocks classified to be bypassed. Such requests are sent to memory directly without tag checks at intermediary levels in the cache hierarchy. Overall, we find that we can robustly reduce the miss rate by 23% and improve IPC with 14% on average for memory bound SPEC2000 applications without degrading performance of the other SPEC2000 applications.