Parallel, Pipelined and Folded Architectures for Computation of 1-D and 2-D DCT in Image and Video Codec

Authors:
Shen-Fu Hsiao;Jian-Ming Tseng
Affiliations:
Institute of Computer and Information Engineering, National Sun Yat-Sen University, No. 70, Lien-Hai Road, Kaohsiung 804, Taiwan, R.O.C;Institute of Computer and Information Engineering, National Sun Yat-Sen University, No. 70, Lien-Hai Road, Kaohsiung 804, Taiwan, R.O.C
Venue:
Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
Year:
2001

Citing 7
Cited 2

A CORDIC-based unified systolic architecture for sliding windowapplications of discrete transforms

IEEE Transactions on Signal Processing
A refined fast 2-D discrete cosine transform algorithm

IEEE Transactions on Signal Processing
Unified systolic arrays for computation of the DCT/DST/DHT

IEEE Transactions on Circuits and Systems for Video Technology
A cost-effective architecture for 8×8 two-dimensional DCT/IDCT using direct method

IEEE Transactions on Circuits and Systems for Video Technology
High-throughput VLSI architectures for the 1-D and 2-D discrete cosine transforms

IEEE Transactions on Circuits and Systems for Video Technology
A 100 MHz 2-D 8×8 DCT/IDCT processor for HDTV applications

IEEE Transactions on Circuits and Systems for Video Technology
High throughput CORDIC-based systolic array design for the discrete cosine transform

IEEE Transactions on Circuits and Systems for Video Technology

A JPEG Chip for Image Compression and Decompression

Journal of VLSI Signal Processing Systems
Mapping of Discrete Cosine Transforms onto Distributed Hardware Architectures

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several parallel, pipelined and folded architectures with different throughput rates are presented for computation of DCT, one of the fundamental operations in image/video coding. This paper begins with a new decomposition algorithm for the 1-D DCT coefficient matrix. Then the 2-D DCT problem is converted into the corresponding 1-D counterpart through a regular index mapping technique. Afterward, depending on the trade-off between hardware complexity and speed performance, the derived decomposition algorithm is transformed into different parallel-pipelined and folded architectures that realize the butterfly operations and the post-processing operations. Compared to other DCT processor, our proposed parallel-pipelined architectures, without any intermediate transpose memory, have the features of modularity, regularity, locality, scalability, and pipelinability, with arithmetic hardware cost proportional to the logarithm of the transform length.