PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

Authors:
Matthias Christen;Olaf Schenk;Helmar Burkhart
Affiliations:
-;-;-
Venue:
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Year:
2011

Citing 0
Cited 16

Auto-generation and auto-tuning of 3D stencil codes on GPU clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
High-performance code generation for stencil computations on GPU architectures

Proceedings of the 26th ACM international conference on Supercomputing
A multi-objective auto-tuning framework for parallel codes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Patus for convenient high-performance stencils: evaluation in earthquake simulations

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
PARTANS: An autotuning framework for stencil computation on multi-GPU systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Split tiling for GPUs: automatic parallelization using trapezoidal tiles

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
A stencil compiler for short-vector SIMD architectures

Proceedings of the 27th international ACM conference on International conference on supercomputing
A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices

Proceedings of the 50th Annual Design Automation Conference
A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient 3D stencil computations using CUDA

Parallel Computing
Towards making autotuning mainstream

International Journal of High Performance Computing Applications
Hybrid Hexagonal/Classical Tiling for GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
On Expressing Strategies for Directive-Driven Multicore Programing Models

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

International Journal of Parallel Programming
Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stencil calculations comprise an important class of kernels in many scientific computing applications ranging from simple PDE solvers to constituent kernels in multigrid methods as well as image processing applications. In such types of solvers, stencil kernels are often the dominant part of the computation, and an efficient parallel implementation of the kernel is therefore crucial in order to reduce the time to solution. However, in the current complex hardware micro architectures, meticulous architecture-specific tuning is required to elicit the machine's full compute power. We present a code generation and auto-tuning framework \textsc{Patus} for stencil computations targeted at multi- and many core processors, such as multicore CPUs and graphics processing units, which makes it possible to generate compute kernels from a specification of the stencil operation and a parallelization and optimization strategy, and leverages the auto tuning methodology to optimize strategy-dependent parameters for the given hardware architecture.