Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

Authors:
Abdullah Kayi;Olivier Serres;Tarek El-Ghazawi
Affiliations:
Intel PTD, Hillsboro, USA;The George Washington University, Washington, USA;The George Washington University, Washington, USA
Venue:
International Journal of Parallel Programming
Year:
2014

Citing 30
Cited 0

Evaluating the performance of four snooping cache coherency protocols

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Boosting the performance of hybrid snooping cache protocols

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of a competitive-update cache coherence protocol with migratory data detection

Journal of Parallel and Distributed Computing
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Simics: A Full System Simulation Platform

Computer
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Two Adaptive Hybrid Cache Coherency Protocols

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Improving CC-NUMA Performance Using Instruction-Based Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Bandwidth Adaptive Snooping

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Industry Trends: Chip Makers Turn to Multicore Processors

Computer
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Multicore Processors for Science and Engineering

Computing in Science and Engineering
Reducing the Write Traffic for a Hybrid Cache Protocol

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Extending CC-NUMA systems to support write update optimizations

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Performance Evaluation of Clusters with ccNUMA Nodes - A Case Study

HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Leveraging on-chip networks for data cache migration in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Token tenure: PATCHing token counting using directory-based cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip Multiprocessors (CMPs) have different technological parameters and physical constraints than earlier multi-processor systems, which should be taken into consideration when designing cache coherence protocols. Also, contemporary cache coherence protocols use invalidate schemes that are known to generate a high number of coherence misses. This is especially true under producer-consumer sharing patterns that can become a performance bottleneck as the number of cores increases. This paper presents two mechanisms to design efficient and scalable cache coherence protocols for CMPs. First, we propose an adaptive hybrid protocol to reduce coherence misses observed in write-invalidate based protocols. The proposed protocol is based on a write-invalidate scheme. However, adaptively, it can push updates to potential consumers based on observed producer-consumer sharing patterns. Secondly, we extend this adaptive protocol with an interconnection resource aware mechanism. Experimental evaluations, conducted on a tiled-CMP via full-system simulation, were used to assess the performance from our proposed dynamic hybrid protocols. Performance analysis is presented on a set of scientific applications from the SPLASH-2 and NAS parallel benchmark suites. Results showed that the proposed mechanisms reduce cache-to-cache sharing misses up to 48 % and speed up application performance up to 34 %. In addition, the proposed interconnection resource aware mechanism is proven to perform well under varying interconnection utilizations.