Towards efficient dynamic LLC home bank mapping with noc-level support

Authors:
Mario Lodde;José Flich;Manuel E. Acacio
Affiliations:
Universitat Politècnica de València, Spain;Universitat Politècnica de València, Spain;Universidad de Murcia, Spain
Venue:
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Year:
2013

Citing 17
Cited 0

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Will Physical Scalability Sabotage Performance Gains?

Computer
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors

IEEE Computer Architecture Letters
Compiler-assisted data distribution for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
CloudCache: Expanding and shrinking private caches

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Heterogeneous network design for effective support of invalidation-based coherency protocols

Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
A data layout optimization framework for NUCA-based multicores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support

NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip
ORION 2.0: A Power-Area Simulator for Interconnection Networks

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dynamic directories: a mechanism for reducing on-chip interconnect power in multicores

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

In tiled Chip Multiprocessors (CMPs) the banks of the built-in last level cache (LLC) are usually distributed among the tiles and logically shared. A static mapping of cache blocks to the LLC banks leads to poor efficiency since a block can be mapped to a bank far away from the tiles which actually access it. Partially dynamic policies have been proposed, which however rely on the static mapping of blocks to a set of banks (D-NUCA) or rely on the OS to dynamically load pages to statically mapped addresses (first-touch). We propose a new dynamic approach where the LLC home bank is determined at runtime in hardware, with the memory controller in charge to perform the block mapping when fetched from main memory. To speed up the home bank lookup process, we use simple and lightweight NoC optimizations. When compared with alternative solutions (S-NUCA, D-NUCA, first touch, private LLCs) results with PARSEC and SPLASH-2 applications indicate improvement in locality of LLC blocks in the same tile (56.2% from 5.8%) and more than 33% reduction in load and store miss latencies. This leads to an average reduction of 24% in application's execution time compared to static mapping.