Pthreads programming
Journal of VLSI Signal Processing Systems
Architecture design for deblocking filter in H.264/JVT/AVC
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Hierarchical Parallelization of an H.264/AVC Video Encoder
PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
A video display processing platform for future TV concepts
IEEE Transactions on Consumer Electronics
An efficient architecture for adaptive deblocking filter of H.264/AVC video coding
IEEE Transactions on Consumer Electronics
A platform based bus-interleaved architecture for de-blocking filter in H.264/MPEG-4 AVC
IEEE Transactions on Consumer Electronics
A pipelined hardware implementation of in-loop deblocking filter in H.264/AVC
IEEE Transactions on Consumer Electronics
IEEE Transactions on Circuits and Systems for Video Technology
Complexity of optimized H.26L video decoder implementation
IEEE Transactions on Circuits and Systems for Video Technology
An evaluation of parallelization concepts for baseline-profile compliant H.264/AVC decoders
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
IEEE Transactions on Circuits and Systems for Video Technology
Batch-pipelining for multicore H.264 decoding
Journal of Visual Communication and Image Representation
Hi-index | 0.00 |
Deblocking filter is one of the most time consuming modules in the H.264/AVC decoder as indicated in many studies. Therefore, accelerating deblocking filter is critical for improving the overall decoding performance. This paper proposes a novel parallel algorithm for H.264/AVC deblocking filter to speed the H.264/AVC decoder up. We exploit pixel-level data parallelism among filtering steps, and observe that results of each filtering step only affect a limited region of pixels. We call this "the limited propagation effect". Based on this observation, the proposed algorithm could partition a frame into multiple independent rectangles with arbitrary granularity. The proposed parallel deblocking filter algorithm requires very little synchronization overhead, and provides good scalability. Experimental results show that applying the proposed parallelization method to a SIMD optimized sequential deblocking filter achieves up to 95.31% and 224.07% speedup on a two-core and four-core processor, respectively. We have also observed a significant speedup for H.264/AVC decoding, 21% and 34% on a two-core and four-core processor, respectively.