Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms

Authors:
Eduarda Monteiro;Bruno Vizzotto;Cláudio Diniz;Marilena Maule;Bruno Zatt;Sergio Bampi
Affiliations:
Informatics Institute, PPGC, PGMICRO, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil;Informatics Institute, PPGC, PGMICRO, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil;Informatics Institute, PPGC, PGMICRO, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil;Informatics Institute, PPGC, PGMICRO, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil;Informatics Institute, PPGC, PGMICRO, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil;Informatics Institute, PPGC, PGMICRO, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
Venue:
International Journal of Parallel Programming
Year:
2014

Citing 6
Cited 0

Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems

International Journal of Parallel Programming
Image and Video Compression Standards: Algorithms and Architectures

Image and Video Compression Standards: Algorithms and Architectures
Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation

Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation
Parallel Implementation of the Full Search Block Matching Algorithm for Motion Estimation

ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results

Journal of VLSI Signal Processing Systems
Exploring NVIDIA-CUDA for video coding

MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents an efficient method to map the Full Search algorithm for Motion Estimation (ME) onto General Purpose Graphic Processing Unit (GPGPU) architectures using Compute Unified Device Architecture (CUDA) programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelism potential of Full Search algorithm. Our main goal is to evaluate the feasibility of video codecs implementation using GPGPUs and its advantages and drawbacks compared to other platforms. Therefore, for comparison reasons, three solutions were developed using distinct programming paradigms for distinct underlying hardware architectures: (i) a sequential solution for general-purpose processor (GPP); (ii) a parallel solution for multi-core GPP using OpenMP library; (iii) a distributed solution for cluster/grid machines using Message Passing Interface (MPI) library. The CUDA-based solution for GPGPUs achieves speed-up compatible to the indicated by the theoretical model for different search areas. Our GPGPU Full Search Motion Estimation provides 2脳, 20脳 and 1664脳 speed-up when compared to MPI, OpenMP and sequential implementations, respectively. Compared to state-of-the-art, our solution reaches up to 17脳 speed-up.