A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes

Authors:
James King;Robert M. Kirby
Affiliations:
University of Utah, Salt Lake City, UT;University of Utah, Salt Lake City, UT
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 22
Cited 0

Data Structures for Range Searching

ACM Computing Surveys (CSUR)
Reentrant polygon clipping

Communications of the ACM
Enhanced accuracy by post-processing for finite element methods for hyperbolic equations

Mathematics of Computation
Extension of a Post Processing Technique for the Discontinuous Galerkin Method for Hyperbolic Equations with Application to an Aeroacoustic Problem

SIAM Journal on Scientific Computing
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Investigation of Smoothness-Increasing Accuracy-Conserving Filters for Improving Streamline Integration through Discontinuous Fields

IEEE Transactions on Visualization and Computer Graphics
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
One-Sided Smoothness-Increasing Accuracy-Conserving Filtering for Enhanced Streamline Integration through Discontinuous Fields

Journal of Scientific Computing
Unstructured grid applications on GPU: performance analysis and improvement

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Smoothness-Increasing Accuracy-Conserving (SIAC) Postprocessing for Discontinuous Galerkin Solutions over Structured Triangular Meshes

SIAM Journal on Numerical Analysis
An inspector-executor algorithm for irregular assignment parallelization

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Techniques for the parallelization of unstructured grid applications on multi-GPU systems

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
High-performance code generation for stencil computations on GPU architectures

Proceedings of the 26th ACM international conference on Supercomputing
Efficient Implementation of Smoothness-Increasing Accuracy-Conserving (SIAC) Filters for Discontinuous Galerkin Solutions

Journal of Scientific Computing
A scalable, numerically stable, high-performance tridiagonal solver using GPUs

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A quantitative study of irregular programs on GPUs

IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. Previous work on stencil computations has focused on structured meshes, while giving little attention to unstructured meshes. Performing stencil operations over an unstructured mesh requires sampling of heterogeneous elements which often leads to inefficient memory access patterns and limits data locality/reuse. In this paper, we present an efficient method for performing stencil computations over unstructured meshes which increases data-locality and cache efficiency, and a scalable approach for stencil tiling and concurrent execution. We provide experimental results in the context of post-processing of dG solutions that demonstrate the effectiveness of our approach.