An optimized parallel IDCT on graphics processing units

  • Authors:
  • Biao Wang;Mauricio Alvarez-Mesa;Chi Ching Chi;Ben Juurlink

  • Affiliations:
  • Embedded Systems Architecture, Technische Universitat Berlin, Berlin, Germany;Embedded Systems Architecture, Technische Universitat Berlin, Berlin, Germany, Multimedia Communications, Fraunhofer HHI, Berlin, Germany;Embedded Systems Architecture, Technische Universitat Berlin, Berlin, Germany;Embedded Systems Architecture, Technische Universitat Berlin, Berlin, Germany

  • Venue:
  • Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations conducted on different GPUs show average speedups from 1.7× to 7.4× compared to an optimized single-threaded SIMD CPU version.