Fundamentals of digital image processing
Fundamentals of digital image processing
DCT Implementation with Distributed Arithmetic
IEEE Transactions on Computers
Complexity controllable DCT for real-time H.264 encoder
Journal of Visual Communication and Image Representation
IEEE Transactions on Computers
Computer
A design method for cosine-modulated filter banks using weighted constrained-least-squares filters
Digital Signal Processing
Fixed-point IDCT without multiplications based on B.G. Lee's algorithm
Digital Signal Processing
An Efficient Unified Framework for Implementation of a Prime-Length DCT/IDCT With High Throughput
IEEE Transactions on Signal Processing
Transform domain adaptive linear phase filter
IEEE Transactions on Signal Processing
Arbitrary-ratio image resizing using fast DCT of composite length for DCT-based transcoder
IEEE Transactions on Image Processing
Unified systolic arrays for computation of the DCT/DST/DHT
IEEE Transactions on Circuits and Systems for Video Technology
A generalized architecture for the one-dimensional discrete cosine and sine transforms
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Using a specific input-restructuring sequence, a new VLSI algorithm and architecture have been derived for a high throughput memory-based systolic array VLSI implementation of a discrete cosine transform. The proposed restructuring technique transforms the DCT algorithm into a cycle-convolution and a pseudo-cycle convolution structure as basic computational forms. The proposed solution has been specially designed to have good fixed-point error performances that have been exploited to further reduce the hardware complexity and power consumption. It leads to a ROM based VLSI kernel with good quantization properties. A parallel VLSI algorithm and architecture with a good fixed point implementation appropriate for a memory-based implementation have been obtained. The proposed algorithm can bemapped onto two linear systolic arrays with similar length and form. They can be further efficientlymerged into a single array using an appropriate hardware sharing technique. A highly efficient VLSI chip can be thus obtained with appealing features as good architectural topology, processing speed, hardware complexity and I/O costs. Moreover, the proposed solution substantially reduces the hardware overhead involved by the pre-processing stage that for short length DCT consumes an important percentage of the chip area.