Fine-grained CUDA-based Parallel Intra Prediction for H.264/AVC

Authors:
Wenbin Jiang;Min Long;Hai Jin;Pengcheng Wang
Affiliations:
Services Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China;Services Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China;Services Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China;Services Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Venue:
Proceedings of Network and Operating System Support on Digital Audio and Video Workshop
Year:
2014

Citing 10
Cited 0

Streaming HD H.264 encoder on programmable processors

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Highly parallel rate-distortion optimized intra-mode decision on multicore graphics processors

IEEE Transactions on Circuits and Systems for Video Technology
Intra frame encoding using programmable graphics hardware

PCM'07 Proceedings of the multimedia 8th Pacific Rim conference on Advances in multimedia information processing
A Multilevel Parallel Intra Coding for H.264/AVC Based on CUDA

ICIG '11 Proceedings of the 2011 Sixth International Conference on Image and Graphics
Fast H.264/AVC FRExt Intra Coding Using Belief Propagation

IEEE Transactions on Image Processing
Overview of the H.264/AVC video coding standard

IEEE Transactions on Circuits and Systems for Video Technology
Fast mode decision algorithm for intraprediction in H.264/AVC video coding

IEEE Transactions on Circuits and Systems for Video Technology
Intensity Gradient Technique for Efficient Intra-Prediction in H.264/AVC

IEEE Transactions on Circuits and Systems for Video Technology
An Efficient VLSI Architecture for Transform-Based Intra Prediction in H.264/AVC

IEEE Transactions on Circuits and Systems for Video Technology
A Parallel H.264 Encoder with CUDA: Mapping and Evaluation

ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the power of the Graphics Processing Unit (GPU) has largely increased, whereas previous works of intra prediction on the GPU could not efficiently exploit the massive parallel opportunity. The related work only achieves frame-level, slice-level or block-level parallelism. It is a challenge to implement fine-grained parallelism on the Compute Unified Device Architecture (CUDA), such as pixel-level and mode-level, because the irregular formulas of intra prediction and the constraints posed by H.264/AVC cause significant branch instructions and the CUDA architecture is inherently not good at handling branches. In this paper, a CUDA-based approach that adopts fine-grained parallelism is presented. By transforming the various prediction formulas to the same form and introducing the predictor unit, an algorithm based on a lookup table is proposed to efficiently eliminate the branches. In addition, the combinatorial frame technique and the optimized encoding order are adopted to maximize the parallelism. Experimental results show that significant encoding time reduction can be achieved and the proposed algorithm outperforms previous works.