Reducing branch divergence in GPU programs

  • Authors:
  • Tianyi David Han;Tarek S. Abdelrahman

  • Affiliations:
  • University of Toronto, Toronto, Ontario, Canada;University of Toronto, Toronto, Ontario, Canada

  • Venue:
  • Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Branch divergence has a significant impact on the performance of GPU programs. We propose two novel software-based optimizations, called iteration delaying and branch distribution that aim to reduce branch divergence. Iteration delaying targets a divergent branch enclosed by a loop within a kernel. It improves performance by executing loop iterations that take the same branch direction and delaying those that take the other direction until later iterations. Branch distribution reduces the length of divergent code by factoring out structurally similar code from the branch paths. We conduct a preliminary evaluation of the two optimizations using both synthetic benchmarks and a highly-optimized real-world application. Our evaluation shows that they improve the performance of the synthetic benchmarks by as much as 30% and 80% respectively, and that of the real-world application by 12% and 16% respectively.