ICIG '04 Proceedings of the Third International Conference on Image and Graphics
Architecture Design for H.264/AVC Integer Motion Estimation with Minimum Memory Bandwidth
IEEE Transactions on Consumer Electronics
A new diamond search algorithm for fast block-matching motion estimation
IEEE Transactions on Image Processing
Hexagon-based search pattern for fast block motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
Overview of the H.264/AVC video coding standard
IEEE Transactions on Circuits and Systems for Video Technology
Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
IEEE Transactions on Circuits and Systems for Video Technology
A High-Performance Sum of Absolute Difference Implementation for Motion Estimation
IEEE Transactions on Circuits and Systems for Video Technology
Fast Algorithm and Architecture Design of Low-Power Integer Motion Estimation for H.264/AVC
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
Analysis and architecture design of scalable fractional motion estimation for H.264 encoding
Integration, the VLSI Journal
ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems
Efficient smart-camera accelerator: A configurable motion estimator dedicated to video codec
Journal of Systems Architecture: the EUROMICRO Journal
An efficient low-cost FPGA implementation of a configurable motion estimation for H.264 video coding
Journal of Real-Time Image Processing
Hi-index | 0.00 |
Fractional Motion Estimation (FME) in high-definition H.264 presents a significant design challenge in terms of memory bandwidth, latency and area cost as there are various modes and complex mode decision flow, which require over 45% of the computation complexity in the H.264 encoding process. In this paper, a new high-performance VLSI architecture for Fractional Motion Estimation (FME) in H.264/AVC based on the full-search algorithm is presented. This architecture is made up of three different pipeline processors to establish a trade-off between processing time and hardware utilization. The computing scheme based on a 4-pixel interpolation unit with a 10-pixel input bandwidth is capable of processing a macroblock (MB) in 870 clock cycles. The final VLSI implementation only requires 11.4 k gates and 4.4kBytes of RAM in a standard 180 nm CMOS technology operating at 290 MHz. Our design generates the residual image and the best MVs and mode in a high throughput and low area cost architecture while achieving enough processing capacity for 1080HD (1920驴脳驴1088@30fps) real-time video streams.