A Parallel Implementation of the 2D Wavelet Transform Using CUDA

Authors:
Joaquín Franco;Gregorio Bernabé;Juan Fernández;Manuel E. Acacio
Affiliations:
-;-;-;-
Venue:
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Year:
2009

Citing 0
Cited 4

Parallel processing with CUDA in ceramic tiles classification

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Wavelet techniques for option pricing on advanced architectures

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Fast wavelet transform utilizing a multicore-aware framework

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Two-Dimensional discrete wavelet transform on large images for hybrid computing architectures: GPU and CELL

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192x8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.