Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data forwarding in scalable shared-memory multiprocessors
ICS '95 Proceedings of the 9th international conference on Supercomputing
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Improving CC-NUMA Performance Using Instruction-Based Prediction
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
A low latency router supporting adaptivity for on-chip interconnects
Proceedings of the 42nd annual Design Automation Conference
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Microarchitecture of the Godson-2 processor
Journal of Computer Science and Technology
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Chip multiprocessors (CMPs) with on-chip network connecting processor cores have been pervasively accepted as a promising technology to efficiently utilize the ever increasing density of transistors on a chip. Communications in CMPs require invalidating cached copies of a shared data block. The coherence traffic incurs more and more significant overhead as the number of cores in a CMP increases. Conventional designs of cache coherence protocols do not take into account characteristics of underlying networks for flexibility reasons. However, in CMPs, processor cores and the on-chip network are tightly integrated. Exposing the network features to cache coherence protocols will unveil some optimization opportunities. In this paper, we propose distance aware protocol and multi-target invalidations, which exploit the network characteristics to reduce the invalidation traffic overhead at negligible hardware cost. Experimental results on a 16-core CMP simulator showed that the two mechanisms reduced the average invalidation traffic latency by 5%, up to 8%.