Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder
IEEE Transactions on Circuits and Systems for Video Technology
A VLSI-oriented algorithm and its implementation for AVS chroma interpolation
Image Communication
A 4×4 pipelined intra frame decoder for H.264
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Low-power bitstream-residual decoder for H.264/AVC baseline profile decoding
EURASIP Journal on Embedded Systems
ESVD: an integrated energy scalable framework for low-power video decoding systems
EURASIP Journal on Wireless Communications and Networking - Special issue on multimedia communications over next generation wireless networks
Journal of Real-Time Image Processing
Hi-index | 0.00 |
Prediction, including intra prediction and inter prediction, is the most critical issue in H.264/AVC decoding in terms of processing cycles and computation complexity. These two predictions demand a huge number of memory accesses and account for up to 80% of the total decoding cycles. In this paper, we present the design and VLSI implementation of a novel power-efficient and highly self-adaptive prediction engine that utilizes a 4 × 4 block level pipeline. Based on the different prediction requirements, the prediction pipeline stages, as well as the correlated memory accesses and datapaths, are fully adjustable, which helps to reduce unnecessary decoding operations and energy dissipation while retaining the fixed real-time throughput. Compared with conventional designs, this paper has the advantage of higher efficiency and lower power consumption due to the elimination of all redundant operations and the wide employment of the pipeline and parallel processing. Under different prediction modes, our design is able to decode each macroblock within 500 cycles.Aprototype H.264/AVC baseline decoder chip that utilizes the proposed prediction engine is fabricated with UMC 0.18-µm CMOS 1P6 M technology. The prediction engine contains 79 K gates and 2.8 kb single-port on-chip SRAM, and occupies half of the whole chip area. When running at 1.5 MHz for QCIF 30 f/s real-time decoding, the prediction engine dissipates 268 µW at a 1.8-V power supply.