Fast 3D wavelet transform on multicore and many-core computing platforms

Authors:
V. Galiano;O. López-Granado;M. P. Malumbres;H. Migallón
Affiliations:
Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202;Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202;Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202;Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202
Venue:
The Journal of Supercomputing
Year:
2013

Citing 12
Cited 0

A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting

IEEE Transactions on Parallel and Distributed Systems
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Low bit-rate video coding with 3d lower trees (3D-LTW)

HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part II
Progressive lower trees of wavelet coefficients: efficient spatial and SNR scalable coding of 3d models

PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I
Embedded image coding using zerotrees of wavelet coefficients

IEEE Transactions on Signal Processing
Discrete Wavelet Transform on Consumer-Level Graphics Hardware

IEEE Transactions on Multimedia
Multirate 3-D subband coding of video

IEEE Transactions on Image Processing
Three-dimensional subband coding of video

IEEE Transactions on Image Processing
A new, fast, and efficient image codec based on set partitioning in hierarchical trees

IEEE Transactions on Circuits and Systems for Video Technology
Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT)

IEEE Transactions on Circuits and Systems for Video Technology
Low-Complexity Multiresolution Image Compression Using Wavelet Lower Trees

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The three-dimensional wavelet transform (3D-DWT) has focused the attention of the research community, most of all in areas such as video watermarking, compression of volumetric medical data, multispectral image coding, 3D model coding and video coding. In this work, we present several strategies to speed up the 3D-DWT computation through multicore processing. An in depth analysis of the available compiler optimizations is also presented. Depending on both the multicore platform and the GOP size, the developed parallel algorithm obtains efficiencies above 95 % using up to four cores (or processes), and above 83 % using up to 12 cores. Furthermore, the extra memory requirements is under 0.12 % for low resolution video frames, and under 0.017 % for high resolution video frames. In this work, we also present a CUDA-based algorithm to compute the 3D-DWT using the shared memory for the extra memory demands, obtaining speed-ups up to 12.68 on the many-core GTX280 platform. In areas such as video processing or ultra high definition image processing, the memory requirements can significantly degrade the developed algorithms, however, our algorithm increases the memory requirements in a negligible percentage, being able to perform a nearly in-place computation of the 3D-DWT whereas in other state-of-the-art 3D-DWT algorithms it is quite common to use a different memory space to store the computed wavelet coefficients doubling in this manner the memory requirements.