Software-level scheduling to exploit non-uniformly shared data cache on GPGPU

  • Authors:
  • Bo Wu;Weilin Wang;Xipeng Shen

  • Affiliations:
  • The College of William and Mary;The College of William and Mary;The College of William and Mary

  • Venue:
  • Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data cache is introduced to GPUs to mitigate the irregular memory access problem. But few studies have investigated how to exploit its full potential. In this work, we consider some important GPU applications that feature data sharing across thread blocks. We show that the sharing is not well exploited because current GPU runtime ignores such a factor when scheduling threads. We then present an application-level transformation to remap thread blocks to data on the fly. With the software-level scheduler, thread blocks with much data sharing are scheduled to share the cache on a streaming multiprocessor (SM). Experiments on four benchmarks show 1.23X speedup on average.