A NUCA Substrate for Flexible CMP Cache Sharing

Authors:
J. Jaehyuk Huh;C. Changkyu Kim;H. Shafi;L. Lixin Zhang;D. Burger;S. W. Keckler
Affiliations:
Adv. Micro Devices, Sunnyvale;-;-;-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2007

Citing 0
Cited 14

Analysis of static and dynamic energy consumption in NUCA caches: initial results

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
SP-NUCA: a cost effective dynamic non-uniform cache architecture

ACM SIGARCH Computer Architecture News
SlackSim: a platform for parallel simulations of CMPs on CMPs

ACM SIGARCH Computer Architecture News
Compiler-based data classification for hybrid caching

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Multi-CMP system with data communication on the fly

The Journal of Supercomputing
A scalable multiprocessor architecture for pervasive computing

GPC'11 Proceedings of the 6th international conference on Advances in grid and pervasive computing
Reducing energy and increasing performance with traffic optimization in many-core systems

Proceedings of the System Level Interconnect Prediction Workshop
CMP off-chip bandwidth scheduling guided by instruction criticality

Proceedings of the 27th international ACM conference on International conference on supercomputing
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Addressing the challenges of future large-scale many-core architectures

Proceedings of the ACM International Conference on Computing Frontiers
High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policy

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Locality-aware task management for unstructured parallelism: a quantitative limit study

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an organization for the on-chip memory system of a chip multiprocessor in which 16 processors share a 16-Mbyte pool of 64 level-2 (L2) cache banks. The L2 cache is organized as a nonuniform cache architecture (NUCA) array with a switched network embedded in it for high performance. We show that this organization can support a spectrum of degrees of sharing: unshared, in which each processor owns a private portion of the cache, thus reducing hit latency, and completely shared, in which every processor shares the entire cache, thus minimizing misses, and every point in between. We measure the optimal degree of sharing for different cache bank mapping policies and also evaluate a per-application cache partitioning strategy. We conclude that a static NUCA organization with sharing degrees of 2 or 4 works best across a suite of commercial and scientific parallel workloads. We demonstrate that migratory dynamic NUCA approaches improve performance significantly for a subset of the workloads at the cost of increased complexity, especially as per-application cache partitioning strategies are applied. We also evaluate the energy efficiency of each design point in terms of network traffic, bank accesses, and external memory accesses.