A 136 cycles/MB, luma-chroma parallelized H.264/AVC deblocking filter for QFHD applications
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Hardware design of motion data decoding process for H.264/AVC
Image Communication
Methods for Power/Throughput/Area Optimization of H.264/AVC Decoding
Journal of Signal Processing Systems
Exploiting parallelism in the H.264 deblocking filter by operation reordering
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
FPGA based efficient on-chip memory for image processing algorithms
Microelectronics Journal
De-blocking filter design for HEVC and H.264/AVC
PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
135-MHz 258-K gates VLSI design for all-intra H.264/AVC scalable video encoder
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Very High Throughput Deblocking Filter for H.264/AVC
Journal of Signal Processing Systems
Hi-index | 0.00 |
This paper describes the design and VLSI implementation of a highly efficient, single-port SRAM-based deblocking filter. It can achieve 204 cycles/macroblock throughput for H.264/AVC real-time decoding. Several deblocking filter designs in the literature have been compared and the possibility of realizing them in a pipeline is studied. Eventually we came up with a completely new design which has a five-stage pipeline with gated clock to increase system throughput while reducing power. Data hazards and structure hazards, which are the two most critical issues for a pipelined filter, are analyzed and resolved. Efficient memory organization for both on-chip SRAM and transposition buffers is employed. By using innovative hybrid edge filtering sequence and out-of-order memory update scenario, we obtain zero stall cycle in normal pipeline flow, making the best out of a pipelined architecture. Compared with existing designs, our design achieves at least 18% clock cycle reduction, as well as 20% lower power consumption owing to its efficient pipeline and memory architecture. The total gate count is comparable to other designs in literature without using any expensive two-port or dual-port on-chip SRAMs.