A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Parallel and Distributed Systems
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Low bit-rate video coding with 3d lower trees (3D-LTW)
HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part II
PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I
Embedded image coding using zerotrees of wavelet coefficients
IEEE Transactions on Signal Processing
Discrete Wavelet Transform on Consumer-Level Graphics Hardware
IEEE Transactions on Multimedia
Multirate 3-D subband coding of video
IEEE Transactions on Image Processing
Three-dimensional subband coding of video
IEEE Transactions on Image Processing
A new, fast, and efficient image codec based on set partitioning in hierarchical trees
IEEE Transactions on Circuits and Systems for Video Technology
Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3-D SPIHT)
IEEE Transactions on Circuits and Systems for Video Technology
Low-Complexity Multiresolution Image Compression Using Wavelet Lower Trees
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
The three-dimensional wavelet transform (3D-DWT) has focused the attention of the research community, most of all in areas such as video watermarking, compression of volumetric medical data, multispectral image coding, 3D model coding and video coding. In this work, we present several strategies to speed up the 3D-DWT computation through multicore processing. An in depth analysis of the available compiler optimizations is also presented. Depending on both the multicore platform and the GOP size, the developed parallel algorithm obtains efficiencies above 95 % using up to four cores (or processes), and above 83 % using up to 12 cores. Furthermore, the extra memory requirements is under 0.12 % for low resolution video frames, and under 0.017 % for high resolution video frames. In this work, we also present a CUDA-based algorithm to compute the 3D-DWT using the shared memory for the extra memory demands, obtaining speed-ups up to 12.68 on the many-core GTX280 platform. In areas such as video processing or ultra high definition image processing, the memory requirements can significantly degrade the developed algorithms, however, our algorithm increases the memory requirements in a negligible percentage, being able to perform a nearly in-place computation of the 3D-DWT whereas in other state-of-the-art 3D-DWT algorithms it is quite common to use a different memory space to store the computed wavelet coefficients doubling in this manner the memory requirements.