Replacement techniques for dynamic NUCA cache designs on CMPs

Authors:
Javier Lira;Carlos Molina;Ryan N. Rakvic;Antonio González
Affiliations:
Intel Barcelona Research Center, Intel Labs--UPC, Barcelona, Spain 2908034;Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, Tarragona, Spain 2643007;Electrical Engineering Department, United States Naval Academy, 105 Maryland Avenue Annapolis, USA 21402-5025;Intel Barcelona Research Center, Intel Labs--UPC, Barcelona, Spain 2908034
Venue:
The Journal of Supercomputing
Year:
2013

Citing 28
Cited 0

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Best of Both Latency and Throughput

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SF-LRU Cache Replacement Algorithm

MTDT '04 Proceedings of the Records of the 2004 International Workshop on Memory Technology, Design and Testing
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Counter-Based Cache Replacement Algorithms

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
SimFlex: Statistical Sampling of Computer System Simulation

IEEE Micro
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Analysis of static and dynamic energy consumption in NUCA caches: initial results

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches

ACM SIGARCH Computer Architecture News
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A novel migration-based NUCA design for chip multiprocessors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

Proceedings of the 46th Annual Design Automation Conference
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Implementing a hybrid SRAM / eDRAM NUCA architecture

HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing influence of wire delay in cache design has meant that access latencies to last-level cache banks are no longer constant. Non-Uniform Cache Architectures (NUCAs) have been proposed to address this problem. Furthermore, an efficient last-level cache is crucial in chip multiprocessors (CMP) architectures to reduce requests to the offchip memory, because of the significant speed gap between processor and memory. Therefore, a bank replacement policy that efficiently manages the NUCA cache is desirable. However, the decentralized nature of NUCA has eliminated the effectiveness of replacement policies because banks operate independently of each other, and hence their replacement decisions are restricted to a single NUCA bank. In this paper, we propose three different techniques to deal with replacements in NUCA caches.