A VLSI design methodology for distributed arithmetic
Journal of VLSI Signal Processing Systems
Variable radix-2 multibit coding for 400 Mpixel/s DCT/IDCT of HDTV video decoder
Integration, the VLSI Journal
Efficient Implementations of Mobile Video Computations on Domain-Specific Reconfigurable Arrays
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Design and Implementaion of a 2D-DCT Architecture Using Coefficient Distributed Arithmetic
ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Some Optimizations of Hardware Multiplication by Constant Matrices
IEEE Transactions on Computers
Fast DCT-I, DCT-III, and DCT-IV via moments
EURASIP Journal on Applied Signal Processing
A Low Complexity Reconfigurable DCT Architecture to Trade off Image Quality for Power Consumption
Journal of Signal Processing Systems
Fixed-point IDCT without multiplications based on B.G. Lee's algorithm
Digital Signal Processing
EURASIP Journal on Advances in Signal Processing - Special issue on quantization of VLSI digital signal processing systems
High throughput DA-based DCT with high accuracy error-compensated adder tree
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
VLSI Design - Special issue on VLSI Circuits, Systems, and Architectures for Advanced Image and Video Compression Standards
ACM SIGARCH Computer Architecture News
On reconfiguration-oriented approximate adder design and its application
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 14.98 |
This paper presents an efficient method for implementing the Discrete Cosine Transform (DCT) with distributed arithmetic. While conventional approaches use the original DCT algorithm or the even-odd frequency decomposition of the DCT algorithm, the proposed architecture uses the recursive DCT algorithm and requires less area than the conventional approaches, regardless of the memory reduction techniques employed in the ROM Accumulators (RACs). An efficient architecture for implementing the scaled DCT with distributed arithmetic is also proposed. The new architecture requires even less area while keeping the same structural regularity for an easy VLSI implementation. A comparison of synthesized DCT processors shows that the proposed method reduces the hardware area of regular and scaled DCT processors by 17 percent and 23 percent, respectively, relative to a conventional design. With the row-column decomposition method, the proposed architectures can be easily extended to compute the two-dimensional DCT required in many image compression applications such as HDTV.