Multi-Frame Motion-Compensated Prediction for Video Transmission
Multi-Frame Motion-Compensated Prediction for Video Transmission
A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture
IEEE Transactions on Circuits and Systems for Video Technology
Motion- and aliasing-compensated prediction for hybrid video coding
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard
IEEE Transactions on Circuits and Systems for Video Technology
Level C+ data reuse scheme for motion estimation with corresponding coding orders
IEEE Transactions on Circuits and Systems for Video Technology
Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
IEEE Transactions on Circuits and Systems for Video Technology
A reconfigurable architecture for multi-frame motion estimation
CSS'11 Proceedings of the 5th WSEAS international conference on Circuits, systems and signals
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
Variable block size motion estimation (VBSME) is one of several contributors to H.264/AVC's excelleut coding efficiency. However, its high computational complexity and huge memory traffic. make deign difficult. In this paper, we propose a memory-efficient and highly parallel VLSI architecture for full search VBSME (FSVBSME). Our architecture consists of 16 2-D arrays each consists of 16 ×16 processing elements (PEs). Four arrays form a group to match in parallel four reference blocks against one current block. Four groups perform block matching for four current blocks in a pipelined fashion. Taking advantage of overlapping among multiple reference blocks of a current block and between search windows of adjacent current blocks, we propose a novel data reuse scheme to reduce memory access. Compared with the popular Level C data reuse scheme, our approach can save 98% of on-chip memory access with only 25% of local memory overhead. Synthesized into a TSMC 180-nm CMOS cell library, our design is capable of processing 1920 × 1088 30 Cps video when running at 130 MHz. The architecture is scalable for wider search range, multiple reference frames and pixel truncation as well as down sampling. We suggest a criterion called design efficiency for comparing different works. It shows that the proposed desiig is 72% more efficient than the best design to date.