An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors
IEEE Computer Architecture Letters
Compiler-assisted data distribution for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
CloudCache: Expanding and shrinking private caches
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Heterogeneous network design for effective support of invalidation-based coherency protocols
Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support
NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip
ORION 2.0: A Power-Area Simulator for Interconnection Networks
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dynamic directories: a mechanism for reducing on-chip interconnect power in multicores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
In tiled Chip Multiprocessors (CMPs) the banks of the built-in last level cache (LLC) are usually distributed among the tiles and logically shared. A static mapping of cache blocks to the LLC banks leads to poor efficiency since a block can be mapped to a bank far away from the tiles which actually access it. Partially dynamic policies have been proposed, which however rely on the static mapping of blocks to a set of banks (D-NUCA) or rely on the OS to dynamically load pages to statically mapped addresses (first-touch). We propose a new dynamic approach where the LLC home bank is determined at runtime in hardware, with the memory controller in charge to perform the block mapping when fetched from main memory. To speed up the home bank lookup process, we use simple and lightweight NoC optimizations. When compared with alternative solutions (S-NUCA, D-NUCA, first touch, private LLCs) results with PARSEC and SPLASH-2 applications indicate improvement in locality of LLC blocks in the same tile (56.2% from 5.8%) and more than 33% reduction in load and store miss latencies. This leads to an average reduction of 24% in application's execution time compared to static mapping.