Design challenges for 0.1um and beyond: embedded tutorial
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Serial and Parallel FPGA-based Variable Block Size Motion Estimation Processors
Journal of Signal Processing Systems
Low-power H.264 video compression architectures for mobile communication
IEEE Transactions on Circuits and Systems for Video Technology
An efficient VLSI architecture for H.264 variable block size motion estimation
IEEE Transactions on Consumer Electronics
Architecture Design for H.264/AVC Integer Motion Estimation with Minimum Memory Bandwidth
IEEE Transactions on Consumer Electronics
A flexible template for H.264/AVC block matching motion estimation architectures
IEEE Transactions on Consumer Electronics
A novel VLSI architecture for full-search variable block-size motion estimation
IEEE Transactions on Consumer Electronics
Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
IEEE Transactions on Circuits and Systems for Video Technology
A High-Performance Sum of Absolute Difference Implementation for Motion Estimation
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Variable block-size motion estimation (VBSME) process occupies a major part of computation of an H.264 encoder, which is usually accelerated by bit-parallel hardware architectures with large I/O bit width to meet real-time constrains. However, such kind of architectures increase the area overhead and pin count, and therefore will not be suitable for area-constrained electronic consumer designs such as small portable multimedia devices. This paper addresses this problem by proposing two area efficient least significant bit (LSB) bit-serial architectures with small pin numbers. Both designs take advantage of data reusing technique in different ways for sum of absolute differences (SAD) computation and reading reference pixels, leading to a considerable reduction of memory bandwidth. The first architecture propagates the partial SAD and sum results and broadcasts the reference pixel rows whereas the second design reuse the SAD of small blocks and has a reconfigurable reference buffer leading to a better memory bandwidth when using hardware parallelism. The proposed designs benefit from several optimization techniques including an efficient serial absolute difference architecture, word length reduction by parallelism, bit truncation, mode filtering, and macroblock (MB) level subsampling, which significantly enhance their performances in terms of silicon area, throughput, latency, and power consumption. The first and second designs can support full search VBSME of 720驴脳驴480 video with 30 frames per second (fps), two reference frames, and [驴16, 15] search range at a clock frequency of 414 MHz with 29.28 k and 31.5 k gates, respectively.