An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology

  • Authors:
  • Gregorio Bernabé;Ricardo Fernández;Jose M. García;Manuel E. Acacio;José González

  • Affiliations:
  • Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Campus de Espinardo, 30080 Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Campus de Espinardo, 30080 Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Campus de Espinardo, 30080 Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Campus de Espinardo, 30080 Murcia, Spain;Intel Barcelona Research Center, Intel Labs, Barcelona, 08034 Barcelona, Spain

  • Venue:
  • Parallel Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Video medical compression algorithms based on the 3D wavelet transform obtain both excellent compression rates and very good quality, at the expense of a higher execution time. The goal of this work is to improve the execution time of our 3D Wavelet Transform Encoder. We examine and exploit the characteristics and advantages of a hyper-threading processor. The Intel Hyper-threading Technology (HT) is a technique based on simultaneous multi-threading (SMT), which allows several independent threads to issue instructions to multiple functional units in a single cycle. In particular, we present two approaches: data-domain and functional, which differ in the way that the decomposition of the application is performed. The first approach is based on data division, where the same task is performed simultaneously by each thread on an independent part of the data. In the second approach, the processing is divided in different tasks that are executed concurrently on the same data set. Based on the latter approach, we present three proposals that differ in the way that the tasks of the application are divided between the threads. Results show speedups of up to 7% and 34% by the data-domain and functional decomposition, respectively, over a version executed without hyper-threading technology. Finally, we design several implementations of the best method with Pthreads and OpenMP using functional decomposition. We compare them in terms of execution speed, ease of implementation and maintainability of the resulting code.