Algorithm and hardware design of a fast intra frame mode decision module for H.264/AVC encoders

  • Authors:
  • Daniel Palomino;Guilherme Corrêa;Cláudio Diniz;Sergio Bampi;Luciano Agostini;Altamiro Susin

  • Affiliations:
  • INF, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil;INF, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil;INF, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil;INF, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil;CDTEC, Federal University of Pelotas, Pelotas, RS, Brazil;INF, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil

  • Venue:
  • International Journal of Reconfigurable Computing - Special issue on Selected Papers from the Symposium on Integrated Circuits and Systems Design (SBCCI 2011)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the rate-distortion optimization (RDO), the process of choosing the best prediction mode is performed through exhaustive executions of the whole encoding process, increasing significantly the encoder computational complexity. Considering H.264/AVC intra frame prediction, there are several modes to encode a macroblock (MB). This work proposes an algorithm and the hardware design for a fast intra framemode decisionmodule for H.264/AVC encoders. The application of the proposed algorithm reduces in more than 10 times the number of encoding iterations for choosing the best intramode when compared with RDO-based decision. The architecture was synthesized to FPGA and achieved an operation frequency of 98MHz processing more than 300 HD1080p frames per second. With this approach, we achieved one order-of-magnitude performance improvement compared with RDO-based approaches, which is very important not only from the performance but also from the energy consumption perspective for battery-operated devices. In order to compare the architecture with previously published works, we also synthesized it to standard cells. Compared with the best previous results reported, the implemented architecture achieves a complexity reduction of five times, a processing capability increase of 14 times, and a reduction in the number of clock cycles per MB of 11 times.