LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors

Authors:
Javier Lira;Carlos Molina;Antonio González
Affiliations:
Universitat Politècnica de Catalunya;Universitat Rovira i Virgili;Intel Barcelona Research Center, Intel Labs-UPC
Venue:
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Year:
2009

Citing 20
Cited 1

Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Best of Both Latency and Throughput

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SF-LRU Cache Replacement Algorithm

MTDT '04 Proceedings of the Records of the 2004 International Workshop on Memory Technology, Design and Testing
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Counter-Based Cache Replacement Algorithms

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
SimFlex: Statistical Sampling of Computer System Simulation

IEEE Micro
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Analysis of static and dynamic energy consumption in NUCA caches: initial results

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches

ACM SIGARCH Computer Architecture News
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A novel migration-based NUCA design for chip multiprocessors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal

LP-NUCA: networks-in-cache for high-performance low-power embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing speed-gap between processor and memory and the limited memory bandwidth make last-level cache performance crucial for CMP architectures. Non Uniform Cache Architectures (NUCA) have been introduced to deal with this problem. This memory organization divides the whole memory space into smaller pieces or banks allowing nearer banks to have better access latencies than further banks.Moreover, an adaptive replacement policy that efficiently reduces misses in the last-level cache could boost performance, particularly if set associativity is adopted. Unfortunately, traditional replacement policies do not behave properly as they were designed for single-processors. This paper focuses on Bank Replacement. This policy involves three key decisions when there is a miss: where to place a data block within the cache set, which data to evict from the cache set and finally, where to place the evicted data. We propose a novel replacement technique that enables more intelligent replacement decisions to be taken. This technique is based on the observation that some types of data are less commonly accessed depending on which bank they reside in. We call this technique LRU-PEA (Least Recently Used with a Priority Eviction Approach). We show that the proposed technique significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NUCA cache. This translates into an average IPC improvement of 8% and into an Energy per Instruction (EPI) reduction of 5%.