Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Hi-index | 0.00 |
Chip multiprocessors (CMPs) issue write invalidations (WIs) to assure program correctness. In conventional snoop-based protocols, writers broadcast invalidations to all nodes as soon as possible. In this work we show that this approach, while protecting correctness, is inefficient due to two reasons. First, many of the invalidated blocks are not accessed after invalidation making the invalidation unnecessary. Second, among the invalidated blocks many are not accessed anytime soon, making immediate invalidation unnecessary. While invalidating the first group could be avoided altogether, the second group’s invalidation could be delayed without any performance or correctness cost. Accordingly, we show that there exists an ample opportunity to eliminate and/or delay many WIs without harming performance or correctness. Moreover we investigate invalidation necessity and urgency and show that a large share of WIs could be delayed without impacting program outcome. Our study shows that WIs often repeats their behavior from both the necessity and urgency point of view. Finally we study how eliminating unnecessary WIs could potentially reduce bus occupancy.