Techniques and standards for image, video, and audio coding
Techniques and standards for image, video, and audio coding
Transaction level modeling: an overview
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A novel four-step search algorithm for fast block motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
A novel unrestricted center-biased diamond search algorithm for block motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture
IEEE Transactions on Circuits and Systems for Video Technology
Hexagon-based search pattern for fast block motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
Efficient memory IP design for HDTV coding applications
IEEE Transactions on Circuits and Systems for Video Technology
Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
Fast Algorithm and Architecture Design of Low-Power Integer Motion Estimation for H.264/AVC
IEEE Transactions on Circuits and Systems for Video Technology
A novel modular systolic array architecture for full-search block matching motion estimation
IEEE Transactions on Circuits and Systems for Video Technology
Computers and Electrical Engineering
Hi-index | 0.00 |
Fast search algorithms (FSA) used for variable block size motion estimation follow irregular search (data access) patterns. This poses as the main challenge in designing hardware architectures for them. In this study, we build a baseline architecture for fast search algorithms using state-of-the-art components available in academia. We improve its performance by introducing: (1) a super 2-dimensional (2-D) random access memory architecture for reading regular and interleaved two-rows or two-columns as opposed to one-row or one-column accessibility of the state of the art; (2) a 2-D processing element array with a tuned interconnect to support neighborhood connections required by the conventional fast search algorithms and to exploit on-chip data reuse. Results show that our design increases system throughput by up to 85.47%, and achieves power reduction by up to 13.83% with an area increase in the worst case by up to 65.53% compared to the baseline architecture.