On the use of small 2d convolutions on GPUs

Authors:
Shams A. H. Al Umairy;Alexander S. van Amesfoort;Irwan D. Setija;Martijn C. van Beurden;Henk J. Sips
Affiliations:
Delft University of Technology, Delft, The Netherlands;Delft University of Technology, Delft, The Netherlands;ASML, Eindhoven, The Netherlands;Eindhoven University of Technology, Eindhoven, The Netherlands;Delft University of Technology, Delft, The Netherlands
Venue:
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Year:
2010

Citing 1
Cited 1

High performance discrete Fourier transforms on graphics processors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Optimizing convolution operations on GPUs using adaptive tiling

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing many small 2D convolutions using FFTs is a basis for a large number of applications in many domains in science and engineering, among them electromagnetic diffraction modeling in physics. The GPU architecture seems to be a suitable architecture to accelerate these convolutions, but reaching high application performance requires substantial development time and non-portable optimizations. In this work, we present the techniques, performance results and considerations to accelerate small 2D convolutions using CUDA, and compare performance to a multi-threaded CPU implementation. To improve programmability and performance of applications that make heavy use of small convolutions, we argue that two improvements to software and hardware are needed: FFT libraries must be extended with a single convolution function and communication bandwidth between CPU and GPU needs to be drastically improved.