Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Introduction to algorithms
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Mapping the FDTD Application to Many-Core Chip Architectures
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs
ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
High-performance code generation for stencil computations on GPU architectures
Proceedings of the 26th ACM international conference on Supercomputing
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Split tiling for GPUs: automatic parallelization using trapezoidal tiles
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Hybrid Hexagonal/Classical Tiling for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Stencil computations are an integral component of applications in a number of scientific computing domains. Short-vector SIMD instruction sets are ubiquitous on modern processors and can be used to significantly increase the performance of stencil computations. Traditional approaches to optimizing stencils on these platforms have focused on either short-vector SIMD or data locality optimizations. In this paper, we propose a domain specific language and compiler for stencil computations that allows specification of stencils in a concise manner and automates both locality and short-vector SIMD optimizations, along with effective utilization of multi-core parallelism. Loop transformations to enhance data locality and enable load-balanced parallelism are combined with a data layout transformation to effectively increase the performance of stencil computations. Performance increases are demonstrated for a number of stencils on several modern SIMD architectures.