The locality-aware adaptive cache coherence protocol

Authors:
George Kurian;Omer Khan;Srinivas Devadas
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;University of Connecticut, Storrs, CT;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the 40th Annual International Symposium on Computer Architecture
Year:
2013

Citing 22
Cited 1

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
A tagless coherence directory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
ATAC: a 1000-core cache-coherent processor with on-chip optical network

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ParMiBench - An Open-Source Benchmark for Embedded Multiprocessor Systems

IEEE Computer Architecture Letters
Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SPATL: Honey, I Shrunk the Coherence Directory

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Remote store programming: a memory model for embedded multicore

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SCD: A scalable coherence directory with flexible sharer set encoding

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Why on-chip cache coherence is here to stay

Communications of the ACM
Near-threshold voltage (NTV) design: opportunities and challenges

Proceedings of the 49th Annual Design Automation Conference
DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling

NOCS '12 Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Toward application-specific memory reconfiguration for energy efficiency

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.