Design Issues in Division and Other Floating-Point Operations
IEEE Transactions on Computers
Advanced Computer Arithmetic Design
Advanced Computer Arithmetic Design
On-the-Fly Rounding (Computing Arithmetic)
IEEE Transactions on Computers
Novel Optimizations for Hardware Floating-Point Units in a Modern FPGA Architecture
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Small Multiplier-Based Multiplication and Division Operators for Virtex-II Devices
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
High-performance arithmetic for division and the elementary functions
High-performance arithmetic for division and the elementary functions
Tradeoffs of Designing Floating-Point Division and Square Root on Virtex FPGAs
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Improved Small Multiplier Based Multiplication, Squaring and Division
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FPGA Implementation of Fast Radix 4 Division Algorithm
IWSOC '04 Proceedings of the System-on-Chip for Real-Time Applications, 4th IEEE International Workshop
Hi-index | 0.00 |
The execution performances of the Sweeney, Robertson, Tocher (SRT) division algorithm depend on two parameters: the radix-r and the redundancy factor ρ. In this paper, a study of the effect of these parameters on the division performances is presented. At each iteration, the SRT algorithm performs a multiplication by the quotient digit qi+1. This last can be just a simple shift, if the digit qi+1 is a power of two (qi+1 = 2k), otherwise, the SRT iteration needs a multiplier. We propose, in this work, an approach to circumvent this multiplication by decomposing the quotient digit qi+1 into two or three terms multiples of 2. Then, the multiplication is carried out by simple shifts and a carry save addition. The implementation of this approach on Virtex-II field-programmable gate-array (FPGA) circuits gives best performances than the approach which uses the embedded multipliers 18 × 18 bits. The iterations delays are operands sizes independent. The reduction tree delays are at most equivalent to the delay of two Virtex-II slices. This approach was tested for the 4, 8, and 16 radixes in the two cases of minimum and maximum redundancy factors. By this study, we conclude that the use of the radix-8 with a maximum redundancy factor gives the best performances by using our approach for the double precision computation of the SRT division.