GPU optimization of convolution for large 3-d real images

Authors:
Pavel Karas;David Svoboda;Pavel Zemčík
Affiliations:
Centre for Biomedical Image Analysis, Faculty of Informatics, Masaryk University, Brno, Czech Republic;Centre for Biomedical Image Analysis, Faculty of Informatics, Masaryk University, Brno, Czech Republic;Dept. of Computer Graphics and Multimedia, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Venue:
ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems
Year:
2012

Citing 11
Cited 0

The fast Fourier transform and its applications

The fast Fourier transform and its applications
Array Permutation by Index-Digit Permutation

Journal of the ACM (JACM)
Digital Image Processing: PIKS Inside

Digital Image Processing: PIKS Inside
Digital Signal Processing: A Practical Approach

Digital Signal Processing: A Practical Approach
FFT and Convolution Performance in Image Filtering on GPU

IV '06 Proceedings of the conference on Information Visualization
High performance discrete Fourier transforms on graphics processors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Exploring the multiple-GPU design space

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
The GPU Computing Era

IEEE Micro
Efficient Canny Edge Detection Using a GPU

ICNC '10 Proceedings of the 2010 First International Conference on Networking and Computing
Efficient computation of convolution of huge images

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a method for computing convolution of large 3-D images with respect to real signals. The convolution is performed in a frequency domain using a convolution theorem. Due to properties of real signals, the algorithm can be optimized so that both time and the memory consumption are halved when compared to complex signals of the same size. Convolution is decomposed in a frequency domain using the decimation in frequency (DIF) algorithm. The algorithm is accelerated on a graphics hardware by means of the CUDA parallel computing model, achieving up to 10× speedup with a single GPU over an optimized implementation on a quad-core CPU.