ACM Computing Surveys (CSUR)
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Slice-balancing H.264 video encoding for improved scalability of multicore decoding
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
High definition H.264 decoding on cell broadband engine
Proceedings of the 15th international conference on Multimedia
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
A Multi-core Architecture Based Parallel Framework for H.264/AVC Deblocking Filters
Journal of Signal Processing Systems
Parallel Scalability of Video Decoders
Journal of Signal Processing Systems
A Portable and Efficient User Dispatching Mechanism for Multicore Systems
RTCSA '09 Proceedings of the 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
Video coding focusing on block partitioning and occlusion
IEEE Transactions on Image Processing
Batch-Pipelining for H.264 Decoding on Multicore Systems
DCC '10 Proceedings of the 2010 Data Compression Conference
An efficient low bit-rate video-coding algorithm focusing on moving regions
IEEE Transactions on Circuits and Systems for Video Technology
Overview of the H.264/AVC video coding standard
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
High Performance, Low Complexity Video Coding and the Emerging HEVC Standard
IEEE Transactions on Circuits and Systems for Video Technology
An evaluation of parallelization concepts for baseline-profile compliant H.264/AVC decoders
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Pipelining has been applied in many area to improve system performance by overlapping executions of hardware or software computing stages. However, direct pipelining for H.264 decoding is difficult because video bitstreams are encoded with lots of dependencies and little parallelism is left to be explored. Fortunately, pure software pipelining can still be applied to H.264 decoding at macroblock level with reasonable performance gain. However, the pipeline stages might need to synchronize with each other and incur lots of extra overhead. For optimized decoders, the overhead is relatively more significant and software pipelining might lead to negative performance gain. We first group multiple stages into larger batches and execute these batches concurrently, called batch-pipelining, to explore more parallelism on multicore systems. Experimental results show that it can speed the decoding up to 89% and achieve up to 259 and 69 frames per second for resolution 720P and 1080P, respectively, on a 4-core x86 machine over an optimized H.264 decoder. Because of its flexibility, batch-pipelining can be applied to not only H.264 but also many similar applications, such as the next-generation video coding: high efficiency video coding. Therefore, we believe the batch-pipelining mechanism creates a new effective direction for software codec development.