A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Overview of the H.264/AVC video coding standard
IEEE Transactions on Circuits and Systems for Video Technology
Rate-constrained coder control and comparison of video coding standards
IEEE Transactions on Circuits and Systems for Video Technology
Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC
IEEE Transactions on Circuits and Systems for Video Technology
Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
IEEE Transactions on Circuits and Systems for Video Technology
A High-Performance Sum of Absolute Difference Implementation for Motion Estimation
IEEE Transactions on Circuits and Systems for Video Technology
Reconfigurable SAD tree architecture based on adaptive sub-sampling in HDTV application
Proceedings of the 19th ACM Great Lakes symposium on VLSI
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Hi-index | 0.00 |
One hardware efficient and high speed architecture for variable block size motion estimation (VBSME) in H.264 is presented in this paper. By improving the pipeline structure and processing element (PE) circuits, the system latency and hardware cost is reduced, which makes this structure more hardware efficient than the original Propagate Partial SAD architecture. For small and middle frame size picture's coding, the proposed structure can save 12.1% hardware cost compared with original Propagate Partial SAD structure. In the case of HDTV, since small inter modes trivially contribute to the coding quality, we remove modes below 8 × 8 in our design. By adopting mode reduction technique, when the set number of PE array is less than 8, the proposed mode reduction based Propagate Partial SAD structure can work at faster clock speed and consume less hardware cost than widely used SAD Tree architecture. It is more robust to the high speed timing constraint when parallel processing is considered. With TSMC 0.18μm technology in worst work conditions (1.62 V, 125°C), its peak throughput of 8-set PE array structure is 720p@30 Hz with 128 × 64 search range and 5 reference frames. 12 k gates hardware cost can be reduced by our design compared with the parallel SAD Tree architecture.