Complexity-effective multicore coherence

Authors:
Alberto Ros;Stefanos Kaxiras
Affiliations:
Universidad de Murcia, Murcia, Spain;Uppsala University, Uppsala, Sweden
Venue:
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Year:
2012

Citing 30
Cited 2

An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An empirical evaluation of two memory-efficient directory methods

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Shared Memory Consistency Models: A Tutorial

Computer
Simics: A Full System Simulation Platform

Computer
SLID - A Cost-Effektive and Scalable Limited-Directory Scheme for Cache Coherence

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
So Many States, So Little Time: Verifying Memory Coherence in the Cray X1

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A New Scalable Directory Architecture for Large-Scale Multiprocessors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A New Solution to Coherence Problems in Multicache Systems

IEEE Transactions on Computers
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
Subspace snooping: filtering snoops with operating system support

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SPACE: sharing pattern-based directory coherence for multicore scalability

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SWEL: hardware cache coherence protocols to map shared data onto shared caches

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compiler-assisted data distribution for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

IEEE Micro
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
Cuckoo directory: A scalable directory for many-core systems

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
A Primer on Memory Consistency and Cache Coherence

A Primer on Memory Consistency and Cache Coherence
A review of race detection mechanisms

CSR'06 Proceedings of the First international computer science conference on Theory and Applications
DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
POPS: Coherence Protocol Optimization for Both Private and Shared Data

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques

A new perspective for efficient virtual-cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
Multi-grain coherence directories

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much of the complexity and overhead (directory, state bits, invalidations) of a typical directory coherence implementation stems from the effort to make it "invisible" even to the strongest memory consistency model. In this paper, we show that a much simpler, directory-less/broadcast-less, multicore coherence can outperform a directory protocol but without its complexity and overhead. Motivated by recent efforts to simplify coherence, we propose a hardware approach that does not require any application guidance. The cornerstone of our approach is a dynamic, application-transparent, write-policy (write-back for private data, write-through for shared data), simplifying the protocol to just two stable states. Self-invalidation of the shared data at synchronization points allows us to remove the directory (and invalidations) completely, with just a data-race-free guarantee from software. This leads to our main result: a virtually costless coherence that outperforms a MESI directory protocol (by 4.8%) while at the same time reducing shared cache and network energy consumption (by 14.2%) for 15 parallel benchmarks, on 16 cores.