A Unified Architecture for H.264 Multiple Block-Size DCT with Fast and Low Cost Quantization
DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
A Highly Parallel Joint VLSI Architecture for Transforms in H.264/AVC
Journal of Signal Processing Systems
ISVLSI '08 Proceedings of the 2008 IEEE Computer Society Annual Symposium on VLSI
A new hardware implementation of the H.264 8×8 transform and quantization
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A pipelined 8×8 2-D forward DCT hardware architecture for H.264/AVC high profile encoder
PSIVT'07 Proceedings of the 2nd Pacific Rim conference on Advances in image and video technology
A Multitransform Architecture for H.264/AVC High-Profile Coders
IEEE Transactions on Multimedia
A scalable algorithm for RTL insertion of gated clocks based on ODCs computation
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Overview of the H.264/AVC video coding standard
IEEE Transactions on Circuits and Systems for Video Technology
Low-complexity transform and quantization in H.264/AVC
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
In July 2004, a new amendment called Fidelity Range Extensions (FRExt) was added to the H.264/AVC as a standardization initiative motivated by the rapidly growing demands when coding higher-fidelity video material. One improvement present in the FRExt is the inclusion of a new 8x8 integer transform that only makes use of additions and shifters to avoid mismatches between encoders and decoders. This paper presents a processor with pipeline architecture for real-time implementation of the complete process for the 8x8 Transform Coding in H.264: forward 8x8 integer transform, quantization and scaling, re-scaling, inverse 8x8 integer transform and reconstruction of the image block. This architecture has been conceived with the aim of achieving a high operation frequency and high throughput without increasing the hardware complexity. In order to achieve an efficient implementation, hardware solutions have been developed for the different circuit modules. 8x8 forward and inverse transforms are calculated using the separability property with architecture more suitable for pipeline schemes made up of two 1D processors and a transpose register array. New expressions for forward quantization and scaling are presented allowing efficient hardware implementation by avoiding the sign conversion. The inverse quantization has also been optimized in terms of hardware complexity by minimizing the involved arithmetic operations. Furthermore, an exhaustive analysis in the dynamic range of the datapath is made to fix the optimum bus widths with the aim of reducing the size of the circuit while avoiding overflow. Finally, the critical paths of the various computing units have been carefully analyzed and balanced using a pipeline scheme in order to maximize the operation frequency without introducing an excessive latency. A prototype with the proposed architecture has been synthesized in a 130nm HCMOS technology process, which achieves a maximum speed of 330MHz with a throughput of 2640Mpixels/s.