The performance of μ-kernel-based systems
Proceedings of the sixteenth ACM symposium on Operating systems principles
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Complexity Estimation of the H.264 Coded Video Bitstreams
The Computer Journal
Parallel Scalability of Video Decoders
Journal of Signal Processing Systems
H.264/AVC baseline profile decoder complexity analysis
IEEE Transactions on Circuits and Systems for Video Technology
An evaluation of parallelization concepts for baseline-profile compliant H.264/AVC decoders
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream Format
Journal of Signal Processing Systems
Hi-index | 0.00 |
The current trend towards multi-core processors imposes the necessity of finding viable strategies to exploit the additional computational resources in media processing. Among the challenges for video decoding are the appropriate partitioning of decoder steps, efficient tracking of dependencies and resource allocation/synchronization for multiple threads with respect to the resulting overhead. In this paper, we propose two variants of multithreading with distributed synchronization. The first method is optimized for minimum latency decoding, necessary for conversational applications. The second method aims to maximize the total throughput at the cost of a higher latency. In addition, we propose a method of dynamic core usage in order to reduce the total allocated processing resources due to interprocess communication overhead. This method is based on a coarse grained complexity estimation. To implicitly adapt to different combinations of processor architectures, associated memory interfaces and power-saving states, the scheme is feedback assisted. By correlating the initial estimate with the actual required processing time, a sufficiently accurate prediction of the required number of cores for the image processing part can be obtained. Experimental results demonstrate the scaling abilities of up to factor 3.5 on a quad-core machine, as well as the limits of the proposed approach regarding the complexity of sequential bitstream processing. We demonstrate that real-time 4k resolution decoding is feasible on current mid-range PC hardware. For less demanding streams, the adaptive mode reduces the total required CPU resources by up to 10% compared to the greedy approach.