A flexible parallel architecture adapted to block-matching motion-estimation algorithms

Authors:
S. Dutta;W. Wolf
Affiliations:
Dept. of Electr. Eng., Princeton Univ., NJ;-
Venue:
IEEE Transactions on Circuits and Systems for Video Technology
Year:
1996

Citing 0
Cited 11

Fast MPEG-4 Motion Estimation: Processor Based and Flexible VLSI Implementations

Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
Video compression with parallel processing

Parallel Computing - Parallel computing in image and video processing
A 2D Addressing Mode for Multimedia Applications

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A 2D addressing mode for multimedia applications

Embedded processor design challenges
VLSI Architecture for a Flexible Motion Estimation with Parameters

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results

Journal of VLSI Signal Processing Systems
Energy-efficient motion estimation using error-tolerance

Proceedings of the 2006 international symposium on Low power electronics and design
Parallel motion estimation on the MDSP multiprocessor

Neural, Parallel & Scientific Computations
Error-resilient motion estimation architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A configurable motion estimation architecture for block-matching algorithms

IEEE Transactions on Circuits and Systems for Video Technology
Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel architecture that offers the flexibility of implementing widely varying motion-estimation algorithms. To achieve real-time performance, we employ multiple processing elements (PE's) which communicate with multiple memory banks via a multistage interconnection network. Three different block-matching algorithms-full search, three-step search, and conjugate-direction search-have been mapped onto this architecture to illustrate its programmability. We schedule the desired operations and design the required data-flow in such a way that processor utilization is high and memory bandwidth is at a feasible level. The details regarding the flow of the pixel data and the scheduling and allocation of the desired ALU operations (which pixels are processed on which processors in which clock cycles) are described in the paper. We analyze the performance of the proposed architecture for several different interconnection networks and data-memory organizations