Where replacement algorithms fail: a thorough analysis

Authors:
Georgios Keramidas;Pavlos Petoumenos;Stefanos Kaxiras
Affiliations:
Industrial Systems Institute, Patras, Greece;University of Patras, Patras, Greece;University of Patras, Patrras, Greece
Venue:
Proceedings of the 7th ACM international conference on Computing frontiers
Year:
2010

Citing 22
Cited 0

A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Run-Time Cache Bypassing

IEEE Transactions on Computers
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cost-Sensitive Cache Replacement Algorithms

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploration of the Spatial Locality on Emerging Applications and the Consequences for Cache Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Automatically Mapping Code on an Intelligent Memory Architecture

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Inter-reference gap distribution replacement: an improved replacement algorithm for set-associative caches

Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Counter-Based Cache Replacement Algorithms

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Emulating Optimal Replacement with a Shepherd Cache

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Single-Usage for Effective Memory Management

ACSAC '07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
Evaluation techniques for storage hierarchies

IBM Systems Journal
Instruction-based reuse-distance prediction for effective cache management

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
MLP-Aware instruction queue resizing: the key to power-efficient performance

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cache placement and eviction, especially at the last level of the memory hierarchy, have received a flurry of research activity recently. The common perception that LRU is a well-performing algorithm has recently been discredited: many researchers have turned their attention to more sophisticated algorithms that are able to substantially improve cache performance. In this paper, we thoroughly examine four recently proposed replacement policies: the Dynamic Insertion Policy (DIP), the Shepherd Cache (SC), the MLP-aware replacement, and the Instruction-based Reuse Distance Prediction (IbRDP) replacement policy. Our experimental studies show that there is a great inconsistency between the number of misses saved by each mechanism and the resulting improvement in IPC. This is particularly true for the DIP and the SC approach and indeed attest to the fact that these algorithms do not take into account the relative cost of each miss (i.e., whether it is an isolated or parallel miss). Their aim is to blindly lower the total number of misses. On the other hand, the MLP-aware replacement, although miss-cost-aware, cannot handle efficiently workloads which display LRU-hostile behavior and thus fails to reduce execution time even when there are ample opportunities to reduce cache misses. The IbRDP replacement policy shows both the ability to deal with non-LRU access patterns and MLP friendliness leading to greater consistency between the reduction of misses and the corresponding increase in performance thus the largest IPC improvement among the studied mechanisms. So, what are the appropriate characteristics of a replacement algorithm targeting the lower levels of the memory hierarchy? In this paper we are shedding some light on this question.