Optimal Circuits for Parallel Multipliers

Authors:
Paul F. Stelling;Charles U. Martel;Vojin G. Oklobdzija;R. Ravi
Affiliations:
The Aerospace Corp., Los Angeles, CA;Univ. of California at Davis, Davis;Integration, Berkeley, CA;Carnegie Mellon Univ., Pittsburgh, PA
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 11
Cited 36

Introduction to algorithms

Introduction to algorithms
Shallow multiplication circuits and wise financial investments

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Optimal carry save networks

Poceedings of the London Mathematical Society symposium on Boolean function complexity
Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design strategies for optimal hybrid final adders in a parallel multiplier

Journal of VLSI Signal Processing Systems - Special issue on VLSI arithmetic and implementations
Computer Arithmetic I (Tutorial)

Computer Arithmetic I (Tutorial)
Computer Arithmetic: Principles, Architecture and Design

Computer Arithmetic: Principles, Architecture and Design
A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach

IEEE Transactions on Computers
Design strategies for the final adder in a parallel multiplier

ASILOMAR '95 Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers (2-Volume Set)
Implementing Multiply-Accumulate Operation in Multiplication Time

ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Design and Clocking of VLSI Multipliers

Design and Clocking of VLSI Multipliers

High-Speed Booth Encoded Parallel Multiplier Design

IEEE Transactions on Computers - Special issue on computer arithmetic
An Optimal Allocation of Carry-Save-Adders in Arithmetic Circuits

IEEE Transactions on Computers
Layout-aware synthesis of arithmetic circuits

Proceedings of the 39th annual Design Automation Conference
Computer arithmetic and hardware: "off the shelf" microprocessors versus "custom hardware"

Theoretical Computer Science
Morphable Multipliers

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Built-in Test with Modified-Booth High-Speed Pipelined Multipliers and Dividers

Journal of Electronic Testing: Theory and Applications
Leakage power minimization for the synthesis of parallel multiplier circuits

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Divide-and-concatenate: an architecture level optimization technique for universal hash functions

Proceedings of the 41st annual Design Automation Conference
An integrated approach to timing-driven synthesis and placement of arithmetic circuits

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Adding Limited Reconfigurability to Superscalar Processors

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Tight integration of timing-driven synthesis and placement of parallel multiplier circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High-Performance Low-Power Left-to-Right Array Multiplier Design

IEEE Transactions on Computers
Low Cost Test Vector Compression/Decompression Scheme for Circuits with a Reconfigurable Serial Multiplier

ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Improved use of the carry-save representation for the synthesis of complex arithmetic circuits

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Towards the automatic exploration of arithmetic-circuit architectures

Proceedings of the 43rd annual Design Automation Conference
Toward architecture-based test-vector generation for timing verification of fast parallel multipliers

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Simple High-Speed Multiplier Design

IEEE Transactions on Computers
Automatic synthesis of compressor trees: reevaluating large counters

Proceedings of the conference on Design, automation and test in Europe
Enhancing FPGA performance for arithmetic circuits

Proceedings of the 44th annual Design Automation Conference
Progressive decomposition: a heuristic to structure arithmetic circuits

Proceedings of the 44th annual Design Automation Conference
A novel FPGA logic block for improved arithmetic performance

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Architectural improvements for field programmable counter arrays: enabling efficient synthesis of fast compressor trees on FPGAs

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Efficient synthesis of compressor trees on FPGAs

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Partial product reduction by using look-up tables for M×N multiplier

Integration, the VLSI Journal
Improving synthesis of compressor trees on FPGAs via integer linear programming

Proceedings of the conference on Design, automation and test in Europe
An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Iterative layering: optimizing arithmetic circuits by structuring the information flow

Proceedings of the 2009 International Conference on Computer-Aided Design
Dual channel addition based FFT processor architecture for signal and image processing

International Journal of High Performance Systems Architecture
Energy efficient implementation of parallel CMOS multipliers with improved compressors

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Truncated binary multipliers with variable correction and minimum mean square error

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Optimized design of parallel carry-select adders

Integration, the VLSI Journal
Multi-operand adder synthesis on FPGAs using generalized parallel counters

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
New hardware architecture for bit-counting

ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Power and delay aware synthesis of multi-operand adders targeting LUT-based FPGAs

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Compressor tree synthesis on commercial high-performance FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Synthesis of Adaptable Hybrid Adders for Area Optimization under Timing Constraint

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	14.99

Visualization

Abstract

We present new design and analysis techniques for the synthesis of parallel multiplier circuits that have smaller predicted delay than the best current multipliers. In [4], Oklobdzija et al. suggested a new approach, the Three-Dimensional Method (TDM), for Partial Product Reduction Tree (PPRT) design that produces multipliers that outperform the current best designs. The goal of TDM is to produce a minimum delay PPRT using full adders. This is done by carefully modeling the relationship of the output delays to the input delays in an adder and, then, interconnecting the adders in a globally optimal way. Oklobdzija et al. suggested a good heuristic for finding the optimal PPRT, but no proofs about the performance of this heuristic were given. We provide a formal characterization of optimal PPRT circuits and prove a number of properties about them. For the problem of summing a set of input bits within the minimum delay, we present an algorithm that produces a minimum delay circuit in time linear in the size of the inputs. Our techniques allow us to prove tight lower bounds on multiplier circuit delays. These results are combined to create a program that finds optimal TDM multiplier designs. Using this program, we can show that, while the heuristic used in[4] does not always find the optimal TDM circuit, it performs very well in terms of overall PPRT circuit delay. However, our search algorithms find better PPRT circuits for reducing the delay of the entire multiplier.