Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite

Authors:
Hussein Al-Zoubi;Aleksandar Milenkovic;Milena Milenkovic
Affiliations:
The University of Alabama in Huntsville;The University of Alabama in Huntsville;The University of Alabama in Huntsville
Venue:
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Year:
2004

Citing 4
Cited 21

Cache Operations by MRU Change

IEEE Transactions on Computers
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Using the Compiler to Improve Cache Replacement Decisions

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach

An analytical model for cache replacement policy performance

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Aspect-oriented design in systemC: implementation and applications

SBCCI '06 Proceedings of the 19th annual symposium on Integrated circuits and systems design
Reducing Data Cache Susceptibility to Soft Errors

IEEE Transactions on Dependable and Secure Computing
Timing predictability of cache replacement policies

Real-Time Systems
Relative competitive analysis of cache replacement policies

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A novel cache architecture with enhanced performance and security

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
SCUD: a fast single-pass L1 cache simulation approach for embedded processors with round-robin replacement policy

Proceedings of the 47th Design Automation Conference
DEW: a fast level 1 cache simulation approach for embedded processors with FIFO replacement policy

Proceedings of the Conference on Design, Automation and Test in Europe
The gradient-based cache partitioning algorithm

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique

Proceedings of the International Conference on Computer-Aided Design
CRUISE: cache replacement and utility-aware scheduling

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Key-Study to execute code using demand paging and NAND flash at smart card scale

CARDIS'10 Proceedings of the 9th IFIP WG 8.8/11.2 international conference on Smart Card Research and Advanced Application
Slimming brick cache strategies for seismic horizon propagation algorithms

VG'10 Proceedings of the 8th IEEE/EG international conference on Volume Graphics
Sensitivity of cache replacement policies

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Towards a performance- and energy-efficient data filter cache

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Hardware-Based Load Value Trace Filtering for On-the-Fly Debugging

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
FIFO cache analysis for WCET estimation: a quantitative approach

Proceedings of the Conference on Design, Automation and Test in Europe
WCET analysis with MRU cache: Challenging LRU for predictability

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Replacement policy, one of the key factors determining the effectiveness of a cache, becomes even more important with latest technological trends toward highly associative caches. The state-of-the-art processors employ various policies such as Random, Least Recently Used (LRU), Round-Robin, and PLRU (Pseudo LRU), indicating that there is no common wisdom about the best one. Optimal yet unattainable policy would replace cache memory block whose next reference is the farthest away in the future, among all memory blocks present in the set.In our quest for replacement policy as close to optimal as possible, we thoroughly explored the design space of existing replacement mechanisms using SimpleScalar toolset and SPEC CPU2000 benchmark suite, across wide range of cache sizes and organizations. In order to better understand the behavior of different policies, we introduced new measures, such as cumulative distribution of cache hits in the LRU stack. We also dynamically monitored the number of cache misses, per each 100000 instructions.Our results show that the PLRU techniques can approximate and even outperform LRU with much lower complexity, for a wide range of cache organizations. However, a relatively large gap between LRU and optimal replacement policy, of up to 50%, indicates that new research aimed to close the gap is necessary. The cumulative distribution of cache hits in the LRU stack indicates a very good potential for way prediction using LRU information, since the percentage of hits to the bottom of the LRU stack is relatively high.