Combining recency of information with selective random and a victim cache in last-level caches

Authors:
Alejandro Valero;Julio Sahuquillo;Salvador Petit;Pedro López;José Duato
Affiliations:
Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2012

Citing 20
Cited 3

A model for estimating trace-sample miss ratios

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Utilizing reuse information in data cache management

ICS '98 Proceedings of the 12th international conference on Supercomputing
Run-Time Cache Bypassing

IEEE Transactions on Computers
Cache Memories

ACM Computing Surveys (CSUR)
LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches

ACM SIGARCH Computer Architecture News
Scavenger: A New Last Level Cache Architecture with Global Block Priority

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Counter-Based Cache Replacement and Bypassing Algorithms

IEEE Transactions on Computers
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Divide-and-conquer: a bubble replacement for low level caches

Proceedings of the 23rd international conference on Supercomputing
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors

Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors
Power7: IBM's Next-Generation Server Processor

IEEE Micro
Instruction-based reuse-distance prediction for effective cache management

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture

Exploiting reuse locality on inclusive shared last-level caches

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches

Proceedings of the 27th international ACM conference on International conference on supercomputing
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common solution is to implement cache hierarchies with large or huge Last-Level Cache (LLC) organizations. LLC memories are implemented with a high number of ways (e.g., 16) to reduce conflict misses. Typically, caches have implemented the LRU algorithm to exploit temporal locality, but its performance goes away from the optimal as the number of ways increases. In addition, the implementation of a strict LRU algorithm is costly in terms of area and power. This article focuses on a family of low-cost replacement strategies, whose implementation scales with the number of ways while maintaining the performance. The proposed strategies track the accessing order for just a few blocks, which cannot be replaced. The victim is randomly selected among those blocks exhibiting poor locality. Although, in general, the random policy helps improving the performance, in some applications the scheme fails with respect to the LRU policy leading to performance degradation. This drawback can be overcome by the addition of a small victim cache of the large LLC. Experimental results show that, using the best version of the family without victim cache, MPKI reduction falls in between 10% and 11% compared to a set of the most representative state-of-the-art algorithms, whereas the reduction grows up to 22% with respect to LRU. The proposal with victim cache achieves speedup improvements, on average, by 4% compared to LRU. In addition, it reduces dynamic energy, on average, up to 8%. Finally, compared to the studied algorithms, hardware complexity is largely reduced by the baseline algorithm of the family.