Hybrid Hexagonal/Classical Tiling for GPUs

Authors:
Tobias Grosser;Albert Cohen;Justin Holewinski;P. Sadayappan;Sven Verdoolaege
Affiliations:
INRIA and École Normale Supérieure;INRIA and École Normale Supérieure;NVIDIA Corporation;Ohio State University;INRIA, École Normale Supérieure and KU Leuven
Venue:
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2014

Citing 17
Cited 0

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Automatic Parallelization in the Polytope Model

The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions

Algorithmica
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
isl: an integer set library for the polyhedral model

ICMS'10 Proceedings of the Third international congress conference on Mathematical software
The pochoir stencil compiler

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction
High-performance code generation for stencil computations on GPU architectures

Proceedings of the 26th ACM international conference on Supercomputing
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Code generation for parallel execution of a class of irregular loops on distributed memory systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Polyhedral parallel code generation for CUDA

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Split tiling for GPUs: automatic parallelization using trapezoidal tiles

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A stencil compiler for short-vector SIMD architectures

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical hyper-rectangular tiles cannot be used due to the combination of backward and forward dependences along space dimensions. Existing techniques trade temporal data reuse for inefficiencies in other areas, such as load imbalance, redundant computations, or increased control flow overhead, therefore making it challenging for use with GPUs. We propose a time-tiling method for iterative stencil computations on GPUs. Our method does not involve redundant computations. It favors coalesced global-memory accesses, data reuse in local/shared-memory or cache, avoidance of thread divergence, and concurrency, combining hexagonal tile shapes along the time and one spatial dimension with classical tiling along the other spatial dimensions. Hexagonal tiles expose multi-level parallelism as well as data reuse. Experimental results demonstrate significant performance improvements over existing stencil compilers.