Data Structures for Range Searching
ACM Computing Surveys (CSUR)
Communications of the ACM
Enhanced accuracy by post-processing for finite element methods for hyperbolic equations
Mathematics of Computation
SIAM Journal on Scientific Computing
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
IEEE Transactions on Visualization and Computer Graphics
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Journal of Scientific Computing
Unstructured grid applications on GPU: performance analysis and improvement
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
SIAM Journal on Numerical Analysis
An inspector-executor algorithm for irregular assignment parallelization
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Techniques for the parallelization of unstructured grid applications on multi-GPU systems
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters
Proceedings of the Tenth International Symposium on Code Generation and Optimization
High-performance code generation for stencil computations on GPU architectures
Proceedings of the 26th ACM international conference on Supercomputing
Journal of Scientific Computing
A scalable, numerically stable, high-performance tridiagonal solver using GPUs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A quantitative study of irregular programs on GPUs
IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Stencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compile-time analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. Previous work on stencil computations has focused on structured meshes, while giving little attention to unstructured meshes. Performing stencil operations over an unstructured mesh requires sampling of heterogeneous elements which often leads to inefficient memory access patterns and limits data locality/reuse. In this paper, we present an efficient method for performing stencil computations over unstructured meshes which increases data-locality and cache efficiency, and a scalable approach for stencil tiling and concurrent execution. We provide experimental results in the context of post-processing of dG solutions that demonstrate the effectiveness of our approach.