Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Boosting the performance of hybrid snooping cache protocols
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data forwarding in scalable shared-memory multiprocessors
ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluation of a competitive-update cache coherence protocol with migratory data detection
Journal of Parallel and Distributed Computing
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Two Adaptive Hybrid Cache Coherency Protocols
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Improving CC-NUMA Performance Using Instruction-Based Prediction
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Proceedings of the 30th annual international symposium on Computer architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Multiple-CMP Systems Using Token Coherence
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
Queue - Multiprocessors
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Multicore Processors for Science and Engineering
Computing in Science and Engineering
Reducing the Write Traffic for a Hybrid Cache Protocol
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Proceedings of the 44th annual Design Automation Conference
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Extending CC-NUMA systems to support write update optimizations
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
Multi-core architectures also referred to as Chip Multiprocessors (CMPs) have emerged as the dominant architecture for both desktop and high-performance systems. CMPs introduce many challenges that need to be addressed to achieve the best performance. One of the big challenges comes with the shared-memory model observed in such architectures which is the cache coherence overhead problem. Contemporary architectures employ write-invalidate based protocols which are known to generate coherence misses that yield to latency issues. On the other hand, write-update based protocols can solve the coherence misses problem but they tend to generate excessive network traffic which is especially not desirable for CMPs. Previous studies have shown that a single protocol approach is not sufficient for many sharing patterns. As a solution, this paper evaluates an adaptive protocol which targets write-update optimizations for producer-consumer sharing patterns. This work targets a minimalistic hardware extension approach to test the benefits of such adaptive protocols in a practical environment. Experimental study is conducted on a 16-core CMP by using a full-system simulator with selected scientific applications from SPLASH-2 and NAS parallel benchmark suites. Results show up to 40% improvement for coherence misses which corresponds to 15% application speedup.