Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors

Authors:
Richard Membarth;Frank Hannig;Hritam Dutta;Jürgen Teich
Affiliations:
Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany;Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany;Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany;Hardware/Software Co-Design, Department of Computer Science, University of Erlangen-Nuremberg, Germany
Venue:
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Year:
2009

Citing 9
Cited 0

High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Bilateral Filtering for Gray and Color Images

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
An Image Processor for Digital Film

ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
A Design Methodology for Hardware Acceleration of Adaptive Filter Algorithms in Image Processing

ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Accelerating advanced mri reconstructions on gpus

Proceedings of the 5th conference on Computing frontiers
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
The JPEG2000 still image coding system: an overview

IEEE Transactions on Consumer Electronics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last decade, there has been a dramatic growth in research and development of massively parallel commodity graphics hardware both in academia and industry. Graphics card architectures provide an optimal platform for parallel execution of many number crunching loop programs from fields like image processing, linear algebra, etc. However, it is hard to efficiently map such algorithms to the graphics hardware even with detailed insight into the architecture. This paper presents a multiresolution image processing algorithm and shows the efficient mapping of this type of algorithms to the graphics hardware. Furthermore, the impact of execution configuration is illustrated and a method is proposed to determine the best configuration offline in order to use it at run-time. Using CUDA as programming model, it is demonstrated that the image processing algorithm is significantly accelerated and that a speedup of up to 33x can be achieved on NVIDIA's Tesla C870 compared to a parallelized implementation on a Xeon Quad Core.