A Low Complexity Reconfigurable DCT Architecture to Trade off Image Quality for Power Consumption
Journal of Signal Processing Systems
High throughput DA-based DCT with high accuracy error-compensated adder tree
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A high performance video transform engine by using space-time scheduling strategy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 35.68 |
Conventional distributed arithmetic (DA) is popular in application-specific integrated circuit (ASIC) design, and it features on-chip ROM to achieve high speed and regularity. In this paper, a new DA architecture called NEDA is proposed, aimed at reducing the cost metrics of power and area while maintaining high speed and accuracy in digital signal processing (DSP) applications. Mathematical analysis proves that DA can implement inner product of vectors in the form of two's complement numbers using only additions, followed by a small number of shifts at the final stage. Comparative studies show that NEDA outperforms widely used approaches such as multiply/accumulate (MAC) and DA in many aspects. Being a high-speed architecture free of ROM, multiplication, and subtraction, NEDA can also expose the redundancy existing in the adder array consisting of entries of 0 and 1. A hardware compression scheme is introduced to generate a butterfly structure with minimum number of additions. NEDA-based architectures for 8 × 8 discrete cosine transform (DCT) core are presented as an example. Savings exceeding 88% are achieved, when the compression scheme is applied along with NEDA. Finite word-length simulations demonstrate the viability and excellent performance of NEDA.