Digital signal processing (3rd ed.): principles, algorithms, and applications
Digital signal processing (3rd ed.): principles, algorithms, and applications
Computational RAM: Implementing Processors in Memory
IEEE Design & Test
Area-Delay Tradeoff in Distributed Arithmetic Based Implementation of FIR Filters
VLSID '97 Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications
Advanced Semiconductor Memories: Architectures, Designs, and Applications
Advanced Semiconductor Memories: Architectures, Designs, and Applications
Symmetric Orthogonal Complex-Valued Filter Bank Design by Semidefinite Programming
IEEE Transactions on Signal Processing
FPGA Realization of FIR Filters by Efficient and Flexible Systolization Using Distributed Arithmetic
IEEE Transactions on Signal Processing - Part I
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Distributed arithmetic (DA)-based computation is popular for its potential for efficient memory-based implementation of finite impulse response (FIR) filter where the filter outputs are computed as inner-product of input-sample vectors and filter-coefficient vector. In this paper, however, we show that the look-up-table (LUT)-multiplier-based approach, where the memory elements store all the possible values of products of the filter coefficients could be an area-efficient alternative to DA-based design of FIR filter with the same throughput of implementation. By operand and inner-product decompositions, respectively, we have designed the conventional LUT-multiplier-based and DA-based structures for FIR filter of equivalent throughput, where the LUT-multiplier-based design involves nearly the same memory and the same number of adders, and less number of input register at the cost of slightly higher adder-widths than the other. Moreover, we present two new approaches to LUT-based multiplication, which could be used to reduce the memory size to half of the conventional LUT-based multiplication. Besides, we present a modified transposed form FIR filter, where a single segmented memory-core with only one pair of decoders are used to minimize the combinational area. The proposed LUT-based FIR filter is found to involve nearly half the memory-space and (1/N) times the complexity of decoders and input-registers, at the cost of marginal increase in the width of the adders, and additional ∼ (4N × W) AND-OR-INVERT gates and ∼ (2N × W) NOR gates. We have synthesized the DA-based design and LUT-multiplier based design of 16-tap FIR filters by Synopsys Design Compiler using TSMC 90 nm library, and find that the proposed LUT-multiplier-based design involves nearly 15% less area than the DA-based design for the same throughput and lower latency of implementation.