Scans as Primitive Parallel Operations
IEEE Transactions on Computers
Slice-balancing H.264 video encoding for improved scalability of multicore decoding
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection
Journal of Visual Communication and Image Representation
A scalable parallel H.264 decoder on the cell broadband engine architecture
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Evaluation of data-parallel H.264 decoding approaches for strongly resource-restricted architectures
Multimedia Tools and Applications
Thread-parallel MPEG-2, MPEG-4 and H.264 video encoders for SoC multi-processor architectures
IEEE Transactions on Consumer Electronics
Accelerate video decoding with generic GPU
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
With the increasing number of processor cores available in modern computing architectures, task or data parallelism is required to maximally exploit the available hardware and achieve optimal processing speed. Current state-of-the-art data-parallel processing methods for decoding image and video bitstreams are limited in parallelism by dependencies introduced by the coding tools and the number of synchronization points introduced by these dependencies, only allowing task or coarse-grain data parallelism. In particular, entropy decoding and data prediction are bottleneck coding tools for parallel image and video decoding. We propose a new data-parallel processing scheme for block-based intra sample and coefficient prediction that allows fine-grain parallelism and is suitable for integration in current and future state-of-the-art image and video codecs. Our prediction scheme enables maximum concurrency, independent of slice or tile configuration, while minimizing synchronization points. This paper describes our data-parallel processing scheme for one- and two-dimensional prediction and investigates its application to block-based image and video codecs using JPEG XR and H.264/AVC Intra as a starting point. We show how our scheme enables faster decoding than the state-of-the-art wavefront method with speedup factors of up to 21.5 and 7.9 for JPEG XR and H.264/AVC Intra coding tools respectively. Using the H.264/AVC Intra coding tool, we discuss the requirements of the algorithm and the impact on decoded image quality when these requirements are not met. Finally, we discuss the impact on coding rate in order to allow for optimal parallel intra decoding.