Fast 3D wavelet transform on multicore and many-core computing platforms

  • Authors:
  • V. Galiano;O. López-Granado;M. P. Malumbres;H. Migallón

  • Affiliations:
  • Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202;Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202;Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202;Physics and Computer Architecture Dept., Miguel Hernández University, Elche, Spain 03202

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The three-dimensional wavelet transform (3D-DWT) has focused the attention of the research community, most of all in areas such as video watermarking, compression of volumetric medical data, multispectral image coding, 3D model coding and video coding. In this work, we present several strategies to speed up the 3D-DWT computation through multicore processing. An in depth analysis of the available compiler optimizations is also presented. Depending on both the multicore platform and the GOP size, the developed parallel algorithm obtains efficiencies above 95 % using up to four cores (or processes), and above 83 % using up to 12 cores. Furthermore, the extra memory requirements is under 0.12 % for low resolution video frames, and under 0.017 % for high resolution video frames. In this work, we also present a CUDA-based algorithm to compute the 3D-DWT using the shared memory for the extra memory demands, obtaining speed-ups up to 12.68 on the many-core GTX280 platform. In areas such as video processing or ultra high definition image processing, the memory requirements can significantly degrade the developed algorithms, however, our algorithm increases the memory requirements in a negligible percentage, being able to perform a nearly in-place computation of the 3D-DWT whereas in other state-of-the-art 3D-DWT algorithms it is quite common to use a different memory space to store the computed wavelet coefficients doubling in this manner the memory requirements.