Concerning with on-chip network features to improve cache coherence protocols for CMPs

  • Authors:
  • Hongbo Zeng;Kun Huang;Ming Wu;Weiwu Hu

  • Affiliations:
  • Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and Graduate University of the Chinese Academy of Sciences, Beiji ...;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and Graduate University of the Chinese Academy of Sciences, Beiji ...;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and Graduate University of the Chinese Academy of Sciences, Beiji ...;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chip multiprocessors (CMPs) with on-chip network connecting processor cores have been pervasively accepted as a promising technology to efficiently utilize the ever increasing density of transistors on a chip. Communications in CMPs require invalidating cached copies of a shared data block. The coherence traffic incurs more and more significant overhead as the number of cores in a CMP increases. Conventional designs of cache coherence protocols do not take into account characteristics of underlying networks for flexibility reasons. However, in CMPs, processor cores and the on-chip network are tightly integrated. Exposing the network features to cache coherence protocols will unveil some optimization opportunities. In this paper, we propose distance aware protocol and multi-target invalidations, which exploit the network characteristics to reduce the invalidation traffic overhead at negligible hardware cost. Experimental results on a 16-core CMP simulator showed that the two mechanisms reduced the average invalidation traffic latency by 5%, up to 8%.