Heterogeneous system coherence for integrated CPU-GPU systems

  • Authors:
  • Jason Power;Arkaprava Basu;Junli Gu;Sooraj Puthoor;Bradford M. Beckmann;Mark D. Hill;Steven K. Reinhardt;David A. Wood

  • Affiliations:
  • University of Wisconsin -- Madison;University of Wisconsin -- Madison;Advanced Micro Devices, Inc.;Advanced Micro Devices, Inc.;Advanced Micro Devices, Inc.;University of Wisconsin -- Madison and Advanced Micro Devices, Inc.;Advanced Micro Devices, Inc.;University of Wisconsin -- Madison and Advanced Micro Devices, Inc.

  • Venue:
  • Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many future heterogeneous systems will integrate CPUs and GPUs physically on a single chip and logically connect them via shared memory to avoid explicit data copying. Making this shared memory coherent facilitates programming and fine-grained sharing, but throughput-oriented GPUs can overwhelm CPUs with coherence requests not well-filtered by caches. Meanwhile, region coherence has been proposed for CPU-only systems to reduce snoop bandwidth by obtaining coherence permissions for large regions. This paper develops Heterogeneous System Coherence (HSC) for CPU-GPU systems to mitigate the coherence bandwidth effects of GPU memory requests. HSC replaces a standard directory with a region directory and adds a region buffer to the L2 cache. These structures allow the system to move bandwidth from the coherence network to the high-bandwidth direct-access bus without sacrificing coherence. Evaluation results with a subset of Rodinia benchmarks and the AMD APP SDK show that HSC can improve performance compared to a conventional directory protocol by an average of more than 2x and a maximum of more than 4.5x. Additionally, HSC reduces the bandwidth to the directory by an average of 94% and by more than 99% for four of the analyzed benchmarks.