Hierarchical overlapped tiling

Authors:
Xing Zhou;Jean-Pierre Giacalone;María Jesús Garzarán;Robert H. Kuhn;Yang Ni;David Padua
Affiliations:
University of Illinois at Urbana-Champaign;Intel Corporation;University of Illinois at Urbana-Champaign;Intel Corporation;Intel Corporation;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Year:
2012

Citing 26
Cited 1

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The Omega Library interface guide

The Omega Library interface guide
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Program Improvement by Source-to-Source Transformation

Journal of the ACM (JACM)
Tiling imperfectly-nested loop nests

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimal semi-oblique tiling

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Symbolic range propagation

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Time Skewing for Parallel Computers

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Cactus Application: Performance Predictions in Grid Environments

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Compact thermal modeling for temperature-aware design

Proceedings of the 41st annual Design Automation Conference
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations

IEEE Transactions on Computers
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs

Proceedings of the 23rd international conference on Supercomputing
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
The pochoir stencil compiler

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Experiences in using cetus for source-to-source transformations

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Split tiling for GPUs: automatic parallelization using trapezoidal tiles

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant amount of redundant computation. Hierarchical overlapped tiling performs overlapped tiling hierarchically to balance communication overhead and redundant computation, and thus has the potential to provide better performance. In this paper, we describe the hierarchical overlapped tiling optimization and its implementation in an OpenCL compiler. We also evaluate the effectiveness of this optimization using 8 programs that implement different forms of stencil computation. Our results show that hierarchical overlapped tiling achieves an average 37% speedup over traditional tiling on a 32-core workstation.