Conversion of reference C code to dataflow model: H.264 encoder case study
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Hierarchical Parallelization of an H.264/AVC Video Encoder
PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
High definition H.264 decoding on cell broadband engine
Proceedings of the 15th international conference on Multimedia
Parallel Scalability of Video Decoders
Journal of Signal Processing Systems
Overview of the H.264/AVC video coding standard
IEEE Transactions on Circuits and Systems for Video Technology
H.264/AVC baseline profile decoder complexity analysis
IEEE Transactions on Circuits and Systems for Video Technology
Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine
Proceedings of the 24th ACM International Conference on Supercomputing
An elastic software cache with fast prefetching for motion compensation in video decoding
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Parallelizing the H.264 decoder on the cell BE architecture
EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
A QHD-capable parallel H.264 decoder
Proceedings of the international conference on Supercomputing
A study of 3D Network-on-Chip design for data parallel H.264 coding
Microprocessors & Microsystems
Scheduling of synchronous data flow models on scratchpad memory based embedded processors
Proceedings of the International Conference on Computer-Aided Design
Architectural Decomposition of Video Decoders by Meansof an Intermediate Data Stream Format
Journal of Signal Processing Systems
Hi-index | 0.00 |
The H.264 video codec provides exceptional video compression while imposing dramatic increases in computational complexity over previous standards. While exploiting parallelism in H.264 is notoriously difficult, successful parallel implementations promise substantial performance gains, particularly as High Definition (HD) content penetrates a widening variety of applications. We present a highly scalable parallelization scheme implemented on IBM's multicore Cell Broadband Engine (CBE) and based on FFmpeg's open source H.264 video decoder. We address resource limitations and complex data dependencies to achieve nearly ideal decoding speedup for the parallelizable portion of the encoded stream. Our decoder achieves better performance than previous implementations, and is deeply scalable for large format video. We discuss architecture and codec specific performance optimizations, code overlays, data structures, memory access scheduling, and vectorization.