Write invalidation analysis in chip multiprocessors

Authors:
Newsha Ardalani;Amirali Baniasadi
Affiliations:
School of Computer Science, IPM, Tehran, Iran;ECE Department, University of Victoria, Victoria, B. C., Canada
Venue:
PATMOS'09 Proceedings of the 19th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Year:
2009

Citing 10
Cited 0

Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip multiprocessors (CMPs) issue write invalidations (WIs) to assure program correctness. In conventional snoop-based protocols, writers broadcast invalidations to all nodes as soon as possible. In this work we show that this approach, while protecting correctness, is inefficient due to two reasons. First, many of the invalidated blocks are not accessed after invalidation making the invalidation unnecessary. Second, among the invalidated blocks many are not accessed anytime soon, making immediate invalidation unnecessary. While invalidating the first group could be avoided altogether, the second group’s invalidation could be delayed without any performance or correctness cost. Accordingly, we show that there exists an ample opportunity to eliminate and/or delay many WIs without harming performance or correctness. Moreover we investigate invalidation necessity and urgency and show that a large share of WIs could be delayed without impacting program outcome. Our study shows that WIs often repeats their behavior from both the necessity and urgency point of view. Finally we study how eliminating unnecessary WIs could potentially reduce bus occupancy.