Automatic synthesis of compressor trees: reevaluating large counters
Proceedings of the conference on Design, automation and test in Europe
Enhancing FPGA performance for arithmetic circuits
Proceedings of the 44th annual Design Automation Conference
A novel FPGA logic block for improved arithmetic performance
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Synthesizing Parsimonious Inexact Circuits through Probabilistic Design Techniques
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Hi-index | 0.00 |
Logic synthesis has made impressive progress in the last decade and has pervaded digital design replacing almost universally manual techniques. A remarkable exception is computer arithmetic and datapath design, where designers still rely mostly on well studied architectures; on datapaths, in most cases, logic synthesis plays at most a minor role in the optimisation of netlists. A case in point is multiple additions performed in carry-save form, such as those fundamentally constituting parallel multipliers: column compressors are usually built exploiting the regularity of the circuit and, due to the very large number of XOR operations, are hardly optimised further by logic synthesisers. In fact, due to the shortcomings of algebraic factoring, XOR operations are usually left untouched by logic synthesisers. In this paper we show a general technique to optimise XOR dominated circuits and we demonstrate its effectiveness on multiplier-like circuits. We show that it optimises significantly the best parallel multipliers by exploiting complex dependencies between the addenda which escape known manual optimisations. Netlists corresponding to top arithmetic architectures can either be synthesised directly or preprocessed through our technique before standard logic synthesis: our preprocessing stage makes it possible to achieve some 20% speed improvement. To our best knowledge, optimisations of the type we show for multiplier-like structures have never been reported - neither manually derived, in computer arithmetic literature, nor automatically derived, in design automation literature.