Reduced computational redundancy implementation of DSP algorithms using computation sharing vector scaling

Authors:
Khurram Muhammad;Kaushik Roy
Affiliations:
Texas Instruments, Inc., Dallas, TX;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2002

Citing 10
Cited 4

Principles of CMOS VLSI design: a systems perspective

Principles of CMOS VLSI design: a systems perspective
Introduction to algorithms

Introduction to algorithms
High-speed VLSI arithmetic processor architectures using hybrid number representation

Journal of VLSI Signal Processing Systems - Special issue: 1990 Workshop on VLSI signal processing
Speed, power, area, and latency tradeoffs in adaptive FIR filtering for PRML read channels

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Computer Arithmetic

Computer Arithmetic
A Systematic Methodology for the Design of High Performance Recursive Digital Filters

IEEE Transactions on Computers
Design Strategies for Optimal Multiplier Circuits

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Switching Characteristics of Generalized Array Multiplier Architectures and their Applications to Low Power Design

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Algorithmic and architectural techniques for low-power digital signal processing

Algorithmic and architectural techniques for low-power digital signal processing
Efficient semisystolic architectures for finite-field arithmetic

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Learning Sparse Overcomplete Codes for Images

Journal of VLSI Signal Processing Systems
Learning Sparse Overcomplete Codes for Images

Journal of VLSI Signal Processing Systems
New reconfigurable architectures for implementing FIR filters with low complexity

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Low Complexity Reconfigurable DSP Circuit Implementations Based on Common Sub-expression Elimination

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a general approach which specifically targets reduction of redundant computation in common digital-signal processing (DSP) tasks such as filtering and matrix multiplication. We show that such tasks can be expressed as multiplication of vectors by scalars and this allows fast multiplication by sharing computation. Vector scaling operation is decomposed to find the most effective precomputations which yield a fast multiplier implementation. Two decomposition approaches are presented, one based on a greedy decomposition and the other based on fixed-size lookup and this leads to two multiplier architectures for vector-scalar products. Analog simulation of an example multiplier shows a speed advantage by a factor of about 1.85 over a conventional carry save array multiplier. Further simulations using 0.18 µ technology show up to 20% speed advantage over Booth encoded Wallace tree multipliers.