Reducing divergence in GPGPU programs with loop merging

  • Authors:
  • Tianyi David Han;Tarek S. Abdelrahman

  • Affiliations:
  • University of Toronto, Toronto, Ontario, Canada;University of Toronto, Toronto, Ontario, Canada

  • Venue:
  • Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Branch divergence can incur a high performance penalty on GPGPU programs. We propose a software optimization, called loop merging, that aims to reduce divergence due to varying trip-count of a loop across warp threads. This optimization merges the divergent loop with one or more outer surrounding loops into one loop. In this way, warp threads do not have to wait for each other in each outer loop iteration, thus improving execution efficiency. We implement loop merging in LLVM. Our evaluation on a Fermi GPU shows that it improves the performance of a synthetic benchmark and five application benchmarks by up to 1.6X and 4.3X respectively.