Introduction to algorithms
Shallow multiplication circuits and wise financial investments
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Poceedings of the London Mathematical Society symposium on Boolean function complexity
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design strategies for optimal hybrid final adders in a parallel multiplier
Journal of VLSI Signal Processing Systems - Special issue on VLSI arithmetic and implementations
Computer Arithmetic I (Tutorial)
Computer Arithmetic I (Tutorial)
Computer Arithmetic: Principles, Architecture and Design
Computer Arithmetic: Principles, Architecture and Design
Design strategies for the final adder in a parallel multiplier
ASILOMAR '95 Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers (2-Volume Set)
Implementing Multiply-Accumulate Operation in Multiplication Time
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Design and Clocking of VLSI Multipliers
Design and Clocking of VLSI Multipliers
High-Speed Booth Encoded Parallel Multiplier Design
IEEE Transactions on Computers - Special issue on computer arithmetic
An Optimal Allocation of Carry-Save-Adders in Arithmetic Circuits
IEEE Transactions on Computers
Layout-aware synthesis of arithmetic circuits
Proceedings of the 39th annual Design Automation Conference
Computer arithmetic and hardware: "off the shelf" microprocessors versus "custom hardware"
Theoretical Computer Science
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Built-in Test with Modified-Booth High-Speed Pipelined Multipliers and Dividers
Journal of Electronic Testing: Theory and Applications
Leakage power minimization for the synthesis of parallel multiplier circuits
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Divide-and-concatenate: an architecture level optimization technique for universal hash functions
Proceedings of the 41st annual Design Automation Conference
An integrated approach to timing-driven synthesis and placement of arithmetic circuits
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Adding Limited Reconfigurability to Superscalar Processors
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Tight integration of timing-driven synthesis and placement of parallel multiplier circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High-Performance Low-Power Left-to-Right Array Multiplier Design
IEEE Transactions on Computers
ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
Improved use of the carry-save representation for the synthesis of complex arithmetic circuits
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Towards the automatic exploration of arithmetic-circuit architectures
Proceedings of the 43rd annual Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Simple High-Speed Multiplier Design
IEEE Transactions on Computers
Automatic synthesis of compressor trees: reevaluating large counters
Proceedings of the conference on Design, automation and test in Europe
Enhancing FPGA performance for arithmetic circuits
Proceedings of the 44th annual Design Automation Conference
Progressive decomposition: a heuristic to structure arithmetic circuits
Proceedings of the 44th annual Design Automation Conference
A novel FPGA logic block for improved arithmetic performance
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Efficient synthesis of compressor trees on FPGAs
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Partial product reduction by using look-up tables for M×N multiplier
Integration, the VLSI Journal
Improving synthesis of compressor trees on FPGAs via integer linear programming
Proceedings of the conference on Design, automation and test in Europe
An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Iterative layering: optimizing arithmetic circuits by structuring the information flow
Proceedings of the 2009 International Conference on Computer-Aided Design
Dual channel addition based FFT processor architecture for signal and image processing
International Journal of High Performance Systems Architecture
Energy efficient implementation of parallel CMOS multipliers with improved compressors
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Truncated binary multipliers with variable correction and minimum mean square error
IEEE Transactions on Circuits and Systems Part I: Regular Papers
Optimized design of parallel carry-select adders
Integration, the VLSI Journal
Multi-operand adder synthesis on FPGAs using generalized parallel counters
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
New hardware architecture for bit-counting
ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Power and delay aware synthesis of multi-operand adders targeting LUT-based FPGAs
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Compressor tree synthesis on commercial high-performance FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Synthesis of Adaptable Hybrid Adders for Area Optimization under Timing Constraint
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 14.99 |
We present new design and analysis techniques for the synthesis of parallel multiplier circuits that have smaller predicted delay than the best current multipliers. In [4], Oklobdzija et al. suggested a new approach, the Three-Dimensional Method (TDM), for Partial Product Reduction Tree (PPRT) design that produces multipliers that outperform the current best designs. The goal of TDM is to produce a minimum delay PPRT using full adders. This is done by carefully modeling the relationship of the output delays to the input delays in an adder and, then, interconnecting the adders in a globally optimal way. Oklobdzija et al. suggested a good heuristic for finding the optimal PPRT, but no proofs about the performance of this heuristic were given. We provide a formal characterization of optimal PPRT circuits and prove a number of properties about them. For the problem of summing a set of input bits within the minimum delay, we present an algorithm that produces a minimum delay circuit in time linear in the size of the inputs. Our techniques allow us to prove tight lower bounds on multiplier circuit delays. These results are combined to create a program that finds optimal TDM multiplier designs. Using this program, we can show that, while the heuristic used in[4] does not always find the optimal TDM circuit, it performs very well in terms of overall PPRT circuit delay. However, our search algorithms find better PPRT circuits for reducing the delay of the entire multiplier.