A 4×4 pipelined intra frame decoder for H.264

Authors:
La-Gou Wu;Duo-Li Zhang;Gao-Ming Du;Yu-Kun Song;Ming-Lun Gao
Affiliations:
Institute of VLSI Design, Hefei University of Technology, Hefei, China;Institute of VLSI Design, Hefei University of Technology, Hefei, China;Institute of VLSI Design, Hefei University of Technology, Hefei, China;Institute of VLSI Design, Hefei University of Technology, Hefei, China;Institute of VLSI Design, Hefei University of Technology, Hefei, China
Venue:
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Year:
2009

Citing 2
Cited 0

A power-efficient and self-adaptive prediction engine for H.264/AVC decoding

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A novel computational complexity and power reduction technique for H.264 intra prediction

IEEE Transactions on Consumer Electronics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Now a number of architectures of intra frame decoder for H. 264 have been put forword, but most of them use an original decoding order. That is current block's prediction shouldn't be executed until previous block's reconstruction is finished. Obviously, this introduces much redundant time. In this paper, we proposed a 4×4 pipelined architecture in which current block's prediction can be paralleled with its previous block's reconstruction. For intra 4×4, we first reorder the original decoding order and then prejudge the pred mode, then the neighboring block's prediction can be pipelined. For intra 16×16 and chroma, we divide them into 4×4 blocks, meanwhile, the intra 16×16 uses the same decoding order as intra 4×4. Furthermore, to enhance the decoding speed, it processes four pixels in parallel in some modules such as inverse quantization, intra prediction and reconstruction. The architecture is implemented in Verilog HDL as a part of H.264 main profile decoder and emulated in FPGA prototyped. Experimental results showed that about 223 to 253 cycles are needed and compared with traditional architecture used by paper [5][6], about 18 to 23 block's reconstruction time can be reduced at the expense of only 7 4bit-comparators to prejudge the pred mode. When running at 62MHz on Altera Stratix II FPGA, it supports real time decoding with 1080HD video sequence in 30fps.