A Novel Implementation of CORDIC Algorithm Using Backward Angle Recoding (BAR)
IEEE Transactions on Computers
Fast Discrete Cosine Transform via Computation of Moments
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
An Efficient Architecture for the In-Place Fast Cosine Transform
Journal of VLSI Signal Processing Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on system-level interconnect prediction (SLIP)
An efficient architecture for the in place fast cosine transform
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Fast DCT-I, DCT-III, and DCT-IV via moments
EURASIP Journal on Applied Signal Processing
Optimized architecture for computing Zadoff-Chu sequences with application to LTE
GLOBECOM'09 Proceedings of the 28th IEEE conference on Global telecommunications
A novel linear array for discrete cosine transform
IMCAS'10 Proceedings of the 9th WSEAS international conference on Instrumentation, measurement, circuits and systems
A novel linear array for discrete cosine transform
WSEAS Transactions on Circuits and Systems
PSIVT'06 Proceedings of the First Pacific Rim conference on Advances in Image and Video Technology
A Hardware-Efficient Algorithm for Real-Time Computation of Zadoff---Chu Sequences
Journal of Signal Processing Systems
Hi-index | 35.69 |
We propose a novel implementation of the discrete cosine transform (DCT) and the inverse DCT (IDCT) algorithms using a CORDIC (coordinate rotation digital computer)-based systolic processor array structure. First, we reformulate an N-point DCT or IDCT algorithm into a rotation formulation which makes it suitable for CORDIC processor implementation. We then propose to use a pipelined CORDIC processor as the basic building block to construct l-D and 2-D systolic-type processor arrays to speed up the DCT and IDCT computation. Due to the proposed novel rotation formulation, we achieve 100% processor utilization in both 1-D and 2-D configurations. Furthermore, we show that for the 2-D configurations, the same data processing throughput rate ran be maintained as long as the processor array dimensions are increased linearly with N. Neither the algorithm formulation or the array configuration need to be modified. Hence, the proposed parallel architecture is scalable to the problem size. These desirable features make this novel implementation compare favorably to previously proposed DCT implementations