An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
SLID - A Cost-Effektive and Scalable Limited-Directory Scheme for Cache Coherence
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
So Many States, So Little Time: Verifying Memory Coherence in the Cray X1
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A New Scalable Directory Architecture for Large-Scale Multiprocessors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A New Solution to Coherence Problems in Multicache Systems
IEEE Transactions on Computers
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SPACE: sharing pattern-based directory coherence for multicore scalability
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SWEL: hardware cache coherence protocols to map shared data onto shared caches
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compiler-assisted data distribution for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Cuckoo directory: A scalable directory for many-core systems
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
A Primer on Memory Consistency and Cache Coherence
A Primer on Memory Consistency and Cache Coherence
A review of race detection mechanisms
CSR'06 Proceedings of the First international computer science conference on Theory and Applications
DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
POPS: Coherence Protocol Optimization for Both Private and Shared Data
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
A new perspective for efficient virtual-cache coherence
Proceedings of the 40th Annual International Symposium on Computer Architecture
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Much of the complexity and overhead (directory, state bits, invalidations) of a typical directory coherence implementation stems from the effort to make it "invisible" even to the strongest memory consistency model. In this paper, we show that a much simpler, directory-less/broadcast-less, multicore coherence can outperform a directory protocol but without its complexity and overhead. Motivated by recent efforts to simplify coherence, we propose a hardware approach that does not require any application guidance. The cornerstone of our approach is a dynamic, application-transparent, write-policy (write-back for private data, write-through for shared data), simplifying the protocol to just two stable states. Self-invalidation of the shared data at synchronization points allows us to remove the directory (and invalidations) completely, with just a data-race-free guarantee from software. This leads to our main result: a virtually costless coherence that outperforms a MESI directory protocol (by 4.8%) while at the same time reducing shared cache and network energy consumption (by 14.2%) for 15 parallel benchmarks, on 16 cores.