Direct coherence: bringing together performance and scalability in shared-memory multiprocessors

Authors:
Alberto Ros;Manuel E. Acacio;José M. García
Affiliations:
Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Murcia, Spain
Venue:
HiPC'07 Proceedings of the 14th international conference on High performance computing
Year:
2007

Citing 13
Cited 0

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Timestamp snooping: an approach for extending SMPs

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors

Computer
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Bandwidth Adaptive Snooping

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
An efficient cache design for scalable glueless shared-memory multiprocessors

Proceedings of the 3rd conference on Computing frontiers
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A novel lightweight directory architecture for scalable shared-memory multiprocessors

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional directory-based cache coherence protocols suffer from long-latency cache misses as a consequence of the indirection introduced by the home node, which must be accessed on every cache miss before any coherence action can be performed. In this work we present a new protocol that moves the role of storing up-to-date coherence information (and thus ensuring totally ordered accesses) from the home node to one of the sharing caches. Our protocol allows most cache misses to be directly solved from the corresponding remote caches, without requiring the intervention of the home node. In this way, cache miss latencies are reduced. Detailed simulations show that this protocol leads to improvements in total execution time of 8% on average over a highly optimized MOESI directory-based protocol.