Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Architecture Considerations for Multi-Format Programmable Video Processors
Journal of Signal Processing Systems
Proceedings of the conference on Design, automation and test in Europe
Journal of Signal Processing Systems
2PARMA: Parallel Paradigms and Run-Time Management Techniques for Many-Core Architectures
ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
Ten years of performance evaluation for concurrent systems using CADP
ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part II
Programming challenges & solutions for multi-processor SoCs: an industrial perspective
Proceedings of the 48th Design Automation Conference
Run-time adaptive energy-aware motion and disparity estimation in multiview video coding
Proceedings of the 48th Design Automation Conference
Overview of the H.264/AVC video coding standard
IEEE Transactions on Circuits and Systems for Video Technology
Level C+ data reuse scheme for motion estimation with corresponding coding orders
IEEE Transactions on Circuits and Systems for Video Technology
Overview of the Scalable Video Coding Extension of the H.264/AVC Standard
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
The optimization process of a H.264/AVC encoder on three different architectures is presented. The architectures are multiand singlecore and SIMD instruction sets have different vector registers size. The need of code optimization is fundamental when addressing HD resolutions with real-time constraints. The encoder is subdivided in functional modules in order to better understand where the optimization is a key factor and to evaluate in details the performance improvement. Common issues in both partitioning a video encoder into parallel architectures and SIMD optimization are described, and author solutions are presented for all the architectures. Besides showing efficient video encoder implementations, one of the main purposes of this paper is to discuss how the characteristics of different architectures and different set of SIMD instructions can impact on the target application performance. Results about the achieved speedup are provided in order to compare the different implementations and evaluate the more suitable solutions for present and next generation video-coding algorithms.