Optimizing FPGA-Based Vector Product Designs

Authors:
Dan Benyamin;John Villasenor;Wayne Luk
Affiliations:
-;-;-
Venue:
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Year:
1999

Citing 5
Cited 3

Introduction to algorithms

Introduction to algorithms
Efficient synthesis of distributed vector multipliers

EUROMICRO 93 Nineteenth EUROMICRO symposium on microprocessing and microprogramming on Open system design : hardware, software and applications: hardware, software and applications
Optimized code generation of multiplication-free linear transforms

DAC '96 Proceedings of the 33rd annual Design Automation Conference
Improving Area Efficiency of FIR Filters Implemented Using Distributed Arithmetic

VLSID '98 Proceedings of the Eleventh International Conference on VLSI Design: VLSI for Signal Processing
Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

A n-Bit Reconfigurable Scalar Quantiser

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
A C to HDL Compiler for Pipeline Processing on FPGAs

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method, called multiple constant multiplier trees (MCMTs), for producing optimized reconfigurable hardware implementations of vector products. An algorithm for generating MCMTs has been developed and implemented, which is based on a novel representation of common sub-expressions in constant data patterns. Our optimization framework covers a wider solution space than previous approaches; it also supports exploitation of full and partial run-time reconfiguration as well as technology-specific constraints, such as fan out limits and routing. We demonstrate that while distributed arithmetic techniques require storage size exponential in the number of coefficients, the resource utilization of MCMTs usually grows linearly with problem size. MCMTs have been implemented in Xilinx 4000 and Virtex FPGAs, and their size and speed efficiency are confirmed in comparisons with Xilinx LogiCore and ASIC implementations of FIR filter designs. Preliminary results show that the size of MCMT circuits is less than half of that of comparable distributed arithmetic cores.