Hybrid Hexagonal/Classical Tiling for GPUs

  • Authors:
  • Tobias Grosser;Albert Cohen;Justin Holewinski;P. Sadayappan;Sven Verdoolaege

  • Affiliations:
  • INRIA and École Normale Supérieure;INRIA and École Normale Supérieure;NVIDIA Corporation;Ohio State University;INRIA, École Normale Supérieure and KU Leuven

  • Venue:
  • Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical hyper-rectangular tiles cannot be used due to the combination of backward and forward dependences along space dimensions. Existing techniques trade temporal data reuse for inefficiencies in other areas, such as load imbalance, redundant computations, or increased control flow overhead, therefore making it challenging for use with GPUs. We propose a time-tiling method for iterative stencil computations on GPUs. Our method does not involve redundant computations. It favors coalesced global-memory accesses, data reuse in local/shared-memory or cache, avoidance of thread divergence, and concurrency, combining hexagonal tile shapes along the time and one spatial dimension with classical tiling along the other spatial dimensions. Hexagonal tiles expose multi-level parallelism as well as data reuse. Experimental results demonstrate significant performance improvements over existing stencil compilers.