Fast MPEG-4 Motion Estimation: Processor Based and Flexible VLSI Implementations
Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
Video compression with parallel processing
Parallel Computing - Parallel computing in image and video processing
A 2D Addressing Mode for Multimedia Applications
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A 2D addressing mode for multimedia applications
Embedded processor design challenges
VLSI Architecture for a Flexible Motion Estimation with Parameters
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results
Journal of VLSI Signal Processing Systems
Energy-efficient motion estimation using error-tolerance
Proceedings of the 2006 international symposium on Low power electronics and design
Parallel motion estimation on the MDSP multiprocessor
Neural, Parallel & Scientific Computations
Error-resilient motion estimation architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A configurable motion estimation architecture for block-matching algorithms
IEEE Transactions on Circuits and Systems for Video Technology
Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding
Microprocessors & Microsystems
Hi-index | 0.00 |
This paper describes a novel architecture that offers the flexibility of implementing widely varying motion-estimation algorithms. To achieve real-time performance, we employ multiple processing elements (PE's) which communicate with multiple memory banks via a multistage interconnection network. Three different block-matching algorithms-full search, three-step search, and conjugate-direction search-have been mapped onto this architecture to illustrate its programmability. We schedule the desired operations and design the required data-flow in such a way that processor utilization is high and memory bandwidth is at a feasible level. The details regarding the flow of the pixel data and the scheduling and allocation of the desired ALU operations (which pixels are processed on which processors in which clock cycles) are described in the paper. We analyze the performance of the proposed architecture for several different interconnection networks and data-memory organizations