Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Authors:
Matthias Christen;Olaf Schenk;Helmar Burkhart
Affiliations:
Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland 4056;Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland 4056;Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland 4056
Venue:
Computer Science - Research and Development
Year:
2011

Citing 9
Cited 3

Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache oblivious stencil computations

Proceedings of the 19th annual international conference on Supercomputing
Parameterized tiled loops for free

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization

COMPSAC '09 Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference - Volume 01
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

SIAM Review
Cache oblivious parallelograms in iterative stencil computations

Proceedings of the 24th ACM International Conference on Supercomputing
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
An efficient code generation technique for tiled iteration spaces

IEEE Transactions on Parallel and Distributed Systems

Abstractions to separate concerns in semi-regular grids

Proceedings of the 27th international ACM conference on International conference on supercomputing
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

International Journal of High Performance Computing Applications
Skeletal based programming for dynamic programming on MultiGPU systems

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for "Parallel Autotuned Stencils," generates a compute kernel from a specification of the stencil operation and a strategy which describes the parallelization and optimization to be applied, and leverages the autotuning methodology to optimize strategy-specific parameters for the given hardware architecture.