FPGA Resource Reduction Through Truncated Multiplication
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Small Multiplier-Based Multiplication and Division Operators for Virtex-II Devices
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Floating Point Unit Generation and Evaluation for FPGAs
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Automatic generation of high-performance multipliers for FPGAs with asymmetric multiplier blocks
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Handbook of Floating-Point Arithmetic
Handbook of Floating-Point Arithmetic
Floating-Point Exponentiation Units for Reconfigurable Computing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
The implementation of high-precision floating-point applications on reconfigurable hardware requires large multipliers. Full multipliers are the core of floating-point multipliers. Truncated multipliers, trading resources for a well-controlled accuracy degradation, are useful building blocks in situations where a full multiplier is not needed. This work studies the automated generation of such multipliers using the embedded multipliers and adders present in the DSP blocks of current FPGAs. The optimization of such multipliers is expressed as a tiling problem, where a tile represents a hardware multiplier, and super-tiles represent combinations of several hardware multipliers and adders, making efficient use of the DSP internal resources. This tiling technique is shown to adapt to full or truncated multipliers. It addresses arbitrary precisions including single, double but also the quadruple precision introduced by the IEEE-754-2008 standard and currently unsupported by processor hardware. An open-source implementation is provided in the FloPoCo project.