A near optimal deblocking filter for H.264 advanced video coding
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
An efficient deblocking filter architecture with 2-dimensional parallel memory for H.264/AVC
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
A Highly Parallel Architecture for Deblocking Filter in H.264/AVC
IEICE - Transactions on Information and Systems
A Memory and Performance Optimized Architecture of Deblocking Filter in H.264/AVC
MUE '07 Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering
Proceedings of the conference on Design, automation and test in Europe
Methods for Power/Throughput/Area Optimization of H.264/AVC Decoding
Journal of Signal Processing Systems
A pipelined hardware implementation of in-loop deblocking filter in H.264/AVC
IEEE Transactions on Consumer Electronics
Low power H.264 deblocking filter hardware implementations
IEEE Transactions on Consumer Electronics
Overview of the H.264/AVC video coding standard
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
In H.264/AVC, a deblocking filter improves visual quality by reducing the presence of blocking artifacts in decoded video frames. The deblocking filter accounts for one third of the computational complexity of the decoder. This paper exploits the scalability on the hardware and the algorithmic level to synergize the performance and to reduce the computational complexity. First, we propose a modular deblocking filter architecture which can be scaled to adapt to the required computing capability for various bit-rates, resolutions, and frame rate of video sequences. The scalable architecture is based on FPGA using dynamic partial reconfiguration. This desirable feature of FPGAs makes it possible for different hardware configurations to be implemented during run-time. The proposed design can be scaled to filter up to four different edges simultaneously, resulting in significant reduction of total processing time. Secondly, our experiments show that significant reduction in computational complexity can be achieved by the increased presence of skipped macroblocks at lower bit-rates, thus, avoiding redundant filtering operations. The implemented architecture is evaluated using the Xilinx Virtex-4 ML410 FPGA board. The design operates at a maximum frequency of 103 MHz. The reconfiguration is done through Internal Configuration Access Port (ICAP) to achieve maximum performance needed by real time applications.