Scans as Primitive Parallel Operations
IEEE Transactions on Computers
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
Journal of the ACM (JACM)
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
IEEE Transactions on Parallel and Distributed Systems
Summed-area tables for texture mapping
SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
Recursive Gaussian Derivative Filters
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
A programming language
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Parallel Processing with the Perfect Shuffle
IEEE Transactions on Computers
Fast scan algorithms on graphics processors
Proceedings of the 22nd annual international conference on Supercomputing
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations
IEEE Transactions on Computers
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Discrete-Time Signal Processing
Discrete-Time Signal Processing
Programming Massively Parallel Processors: A Hands-on Approach
Programming Massively Parallel Processors: A Hands-on Approach
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid
IEEE Transactions on Parallel and Distributed Systems
GPU Prefilter for Accurate Cubic B-spline Interpolation
The Computer Journal
MOMS: maximal-order interpolation of minimal support
IEEE Transactions on Image Processing
Adaptive manifolds for real-time high-dimensional filtering
ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Practical temporal consistency for image-based graphics applications
ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
VEA 2012: Interactive image/video retexturing using GPU parallelism
Computers and Graphics
Real-time rendering of water surfaces with cartography-oriented design
Proceedings of the Symposium on Computational Aesthetics
Hi-index | 0.00 |
Image processing operations like blurring, inverse convolution, and summed-area tables are often computed efficiently as a sequence of 1D recursive filters. While much research has explored parallel recursive filtering, prior techniques do not optimize across the entire filter sequence. Typically, a separate filter (or often a causal-anticausal filter pair) is required in each dimension. Computing these filter passes independently results in significant traffic to global memory, creating a bottleneck in GPU systems. We present a new algorithmic framework for parallel evaluation. It partitions the image into 2D blocks, with a small band of additional data buffered along each block perimeter. We show that these perimeter bands are sufficient to accumulate the effects of the successive filters. A remarkable result is that the image data is read only twice and written just once, independent of image size, and thus total memory bandwidth is reduced even compared to the traditional serial algorithm. We demonstrate significant speedups in GPU computation.