An adaptive cache coherence protocol for chip multiprocessors

Authors:
Abdullah Kayi;Tarek El-Ghazawi
Affiliations:
The George Washington University, Washington, DC;The George Washington University, Washington, DC
Venue:
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Year:
2010

Citing 42
Cited 0

Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Boosting the performance of hybrid snooping cache protocols

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data forwarding in scalable shared-memory multiprocessors

ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluation of a competitive-update cache coherence protocol with migratory data detection

Journal of Parallel and Distributed Computing
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Timestamp snooping: an approach for extending SMPs

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
The Scalable Coherent Interface and Related Standards Projects

IEEE Micro
Simulating a $2M Commercial Server on a $2K PC

Computer
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Two Adaptive Hybrid Cache Coherency Protocols

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Improving CC-NUMA Performance Using Instruction-Based Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Improving Performance of Load-Store Sequences for Transaction Processing Workloads on Multiprocessors

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Bandwidth Adaptive Snooping

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Industry Trends: Chip Makers Turn to Multicore Processors

Computer
The Impact of Performance Asymmetry in Emerging Multicore Architectures

Proceedings of the 32nd annual international symposium on Computer Architecture
The Future of Microprocessors

Queue - Multiprocessors
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Multicore Processors for Science and Engineering

Computing in Science and Engineering
Reducing the Write Traffic for a Hybrid Cache Protocol

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
The kill rule for multicore

Proceedings of the 44th annual Design Automation Conference
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Extending CC-NUMA systems to support write update optimizations

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-core architectures also referred to as Chip Multiprocessors (CMPs) have emerged as the dominant architecture for both desktop and high-performance systems. CMPs introduce many challenges that need to be addressed to achieve the best performance. One of the big challenges comes with the shared-memory model observed in such architectures which is the cache coherence overhead problem. Contemporary architectures employ write-invalidate based protocols which are known to generate coherence misses that yield to latency issues. On the other hand, write-update based protocols can solve the coherence misses problem but they tend to generate excessive network traffic which is especially not desirable for CMPs. Previous studies have shown that a single protocol approach is not sufficient for many sharing patterns. As a solution, this paper evaluates an adaptive protocol which targets write-update optimizations for producer-consumer sharing patterns. This work targets a minimalistic hardware extension approach to test the benefits of such adaptive protocols in a practical environment. Experimental study is conducted on a 16-core CMP by using a full-system simulator with selected scientific applications from SPLASH-2 and NAS parallel benchmark suites. Results show up to 40% improvement for coherence misses which corresponds to 15% application speedup.