ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

Authors:
Mohammad Hammoud;Sangyeun Cho;Rami Melhem
Affiliations:
Department of Computer Science, University of Pittsburgh,;Department of Computer Science, University of Pittsburgh,;Department of Computer Science, University of Pittsburgh,
Venue:
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Year:
2008

Citing 19
Cited 7

Introducing memory into the switch elements of multiprocessor interconnection networks

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
Scheduling and page migration for multiprocessor compute servers

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Low-Latency Virtual-Channel Routers for On-Chip Networks

Proceedings of the 31st annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Implementation and evaluation of a migration-based NUCA design for chip multiprocessors

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems

Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors

Journal of Parallel and Distributed Computing
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Replacement techniques for dynamic NUCA cache designs on CMPs

The Journal of Supercomputing
Towards efficient dynamic LLC home bank mapping with noc-level support

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Exploiting replication to improve performances of NUCA-based CMP systems

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes and studies a hardware-based adaptive controlled migration strategy for managing distributed L2 caches in chip multiprocessors. Building on an area-efficient shared cache design, the proposed scheme dynamically migrates cache blocks to cache banks that best minimize the average L2 access latency. Cache blocks are continuously monitored and the locations of the optimal corresponding cache banks are predicted to effectively alleviate the impact of non-uniform cache access latency. By adopting migration alone without replication, the exclusiveness of cache blocks is maintained, thus further optimizing the cache miss rate. Simulation results using a full system simulator demonstrate that the proposed controlled migration scheme outperforms the shared caching strategy and compares favorably with previously proposed replication schemes.