Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
SP-NUCA: a cost effective dynamic non-uniform cache architecture
ACM SIGARCH Computer Architecture News
Towards hybrid last level caches for chip-multiprocessors
ACM SIGARCH Computer Architecture News
Compositional, dynamic cache management for embedded chip multiprocessors
Proceedings of the conference on Design, automation and test in Europe
Distributed cooperative caching
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
SlackSim: a platform for parallel simulations of CMPs on CMPs
ACM SIGARCH Computer Architecture News
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors
Journal of Signal Processing Systems
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips
Proceedings of the 46th Annual Design Automation Conference
Compiler-based data classification for hybrid caching
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Power-efficient spilling techniques for chip multiprocessors
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
A high performance adaptive miss handling architecture for chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
Reconfigurable multicore architecture for dynamic processor reallocation
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Practically private: enabling high performance CMPs through compiler-assisted data classification
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
CRAW/P: a workload partition method for the efficient parallel simulation of manycores
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Dynamic cache management in multi-core architectures through run-time adaptation
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Jigsaw: scalable software-defined caches
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
The significant speed-gap between processor and memory and the limited chip memory bandwidth make last-level cache performance crucial for future chip multiprocessors. To use the capacity of shared last-level caches efficiently and to allow for a short access time, proposed non-uniform cache architectures (NUCAs) are organized into per-core partitions. If a core runs out of cache space, blocks are typically relocated to nearby partitions, thus managing the cache as a shared cache. This uncontrolled sharing of all resources may unfortunately result in pollution that degrades performance. We propose a novel non-uniform cache architecture in which the amount of cache space that can be shared among the cores is controlled dynamically. The adaptive scheme estimates, continuously, the effect of increasing decreasing the shared partition size on the overall performance. We show that our scheme outperforms a private and shared cache organization as well as a hybrid NUCA organization in which blocks in a local partition can spill over to neighbor core partitions.