A high-throughput ASIC processor for 8×8 transform coding in H.264/AVC

  • Authors:
  • Juan A. Michell;José M. Solana;Gustavo A. Ruiz

  • Affiliations:
  • Departamento de Electrónica y Computadores, Facultad de Ciencias, Universidad de Cantabria, Avda. de Los Castros s/n, 39005 Santander, Spain;Departamento de Electrónica y Computadores, Facultad de Ciencias, Universidad de Cantabria, Avda. de Los Castros s/n, 39005 Santander, Spain;Departamento de Electrónica y Computadores, Facultad de Ciencias, Universidad de Cantabria, Avda. de Los Castros s/n, 39005 Santander, Spain

  • Venue:
  • Image Communication
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In July 2004, a new amendment called Fidelity Range Extensions (FRExt) was added to the H.264/AVC as a standardization initiative motivated by the rapidly growing demands when coding higher-fidelity video material. One improvement present in the FRExt is the inclusion of a new 8x8 integer transform that only makes use of additions and shifters to avoid mismatches between encoders and decoders. This paper presents a processor with pipeline architecture for real-time implementation of the complete process for the 8x8 Transform Coding in H.264: forward 8x8 integer transform, quantization and scaling, re-scaling, inverse 8x8 integer transform and reconstruction of the image block. This architecture has been conceived with the aim of achieving a high operation frequency and high throughput without increasing the hardware complexity. In order to achieve an efficient implementation, hardware solutions have been developed for the different circuit modules. 8x8 forward and inverse transforms are calculated using the separability property with architecture more suitable for pipeline schemes made up of two 1D processors and a transpose register array. New expressions for forward quantization and scaling are presented allowing efficient hardware implementation by avoiding the sign conversion. The inverse quantization has also been optimized in terms of hardware complexity by minimizing the involved arithmetic operations. Furthermore, an exhaustive analysis in the dynamic range of the datapath is made to fix the optimum bus widths with the aim of reducing the size of the circuit while avoiding overflow. Finally, the critical paths of the various computing units have been carefully analyzed and balanced using a pipeline scheme in order to maximize the operation frequency without introducing an excessive latency. A prototype with the proposed architecture has been synthesized in a 130nm HCMOS technology process, which achieves a maximum speed of 330MHz with a throughput of 2640Mpixels/s.