POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
isl: an integer set library for the polyhedral model
ICMS'10 Proceedings of the Third international congress conference on Mathematical software
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Polyhedral code generation in the real world
CC'06 Proceedings of the 15th international conference on Compiler Construction
High-performance code generation for stencil computations on GPU architectures
Proceedings of the 26th ACM international conference on Supercomputing
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Code generation for parallel execution of a class of irregular loops on distributed memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Split tiling for GPUs: automatic parallelization using trapezoidal tiles
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A stencil compiler for short-vector SIMD architectures
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical hyper-rectangular tiles cannot be used due to the combination of backward and forward dependences along space dimensions. Existing techniques trade temporal data reuse for inefficiencies in other areas, such as load imbalance, redundant computations, or increased control flow overhead, therefore making it challenging for use with GPUs. We propose a time-tiling method for iterative stencil computations on GPUs. Our method does not involve redundant computations. It favors coalesced global-memory accesses, data reuse in local/shared-memory or cache, avoidance of thread divergence, and concurrency, combining hexagonal tile shapes along the time and one spatial dimension with classical tiling along the other spatial dimensions. Hexagonal tiles expose multi-level parallelism as well as data reuse. Experimental results demonstrate significant performance improvements over existing stencil compilers.