An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors

Authors:
Haakon Dybdahl;Per Stenstrom
Affiliations:
Norwegian University of Science and Technology, NO-7491 Trondheim, Norway. dybdahl@idi.ntnu.no;Chalmers University of Technology, SE-412 96 Goteborg, Sweden. pers@ce.chalmers.se
Venue:
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Year:
2007

Citing 0
Cited 24

Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
SP-NUCA: a cost effective dynamic non-uniform cache architecture

ACM SIGARCH Computer Architecture News
Towards hybrid last level caches for chip-multiprocessors

ACM SIGARCH Computer Architecture News
Compositional, dynamic cache management for embedded chip multiprocessors

Proceedings of the conference on Design, automation and test in Europe
Distributed cooperative caching

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
SlackSim: a platform for parallel simulations of CMPs on CMPs

ACM SIGARCH Computer Architecture News
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors

Journal of Signal Processing Systems
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

Proceedings of the 46th Annual Design Automation Conference
Compiler-based data classification for hybrid caching

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
The auction: optimizing banks usage in Non-Uniform Cache Architectures

Proceedings of the 24th ACM International Conference on Supercomputing
Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Power-efficient spilling techniques for chip multiprocessors

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
A high performance adaptive miss handling architecture for chip multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers IV
Reconfigurable multicore architecture for dynamic processor reallocation

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Practically private: enabling high performance CMPs through compiler-assisted data classification

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
CRAW/P: a workload partition method for the efficient parallel simulation of manycores

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Dynamic cache management in multi-core architectures through run-time adaptation

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

The significant speed-gap between processor and memory and the limited chip memory bandwidth make last-level cache performance crucial for future chip multiprocessors. To use the capacity of shared last-level caches efficiently and to allow for a short access time, proposed non-uniform cache architectures (NUCAs) are organized into per-core partitions. If a core runs out of cache space, blocks are typically relocated to nearby partitions, thus managing the cache as a shared cache. This uncontrolled sharing of all resources may unfortunately result in pollution that degrades performance. We propose a novel non-uniform cache architecture in which the amount of cache space that can be shared among the cores is controlled dynamically. The adaptive scheme estimates, continuously, the effect of increasing decreasing the shared partition size on the overall performance. We show that our scheme outperforms a private and shared cache organization as well as a hybrid NUCA organization in which blocks in a local partition can spill over to neighbor core partitions.