Tradeoffs of Designing Floating-Point Division and Square Root on Virtex FPGAs
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Low Latency Digit-Recurrence Reciprocal and Square-Root Reciprocal Algorithm and Architecture
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
A Reconfigurable Architecture for Wireless Communication Systems
ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Advanced Components in the Variable Precision Floating-Point Library
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Pipelining of double precision floating point division and square root operations
Proceedings of the 44th annual Southeast regional conference
An FPGA implementation of pipelined multiplicative division with IEEE Rounding
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Floating-point divider design for FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture
IEEE Transactions on Computers
Hi-index | 0.00 |
Floating point arithmetic (FPA) are very crucial and critical domain for the hardware acceleration. FPA are widely used in the vast field of application. The division operation of the FPA is a very intensive operation, in terms of complexity, area requirement and performance speed. This paper presents an efficient FPGA implementation of double-precision FPA divisions on Virtex-2pro FPGA platform, for the ease of comparing with prior works. The proposed method is based on the method of binomial expansion, which uses look-up tables and partial block multipliers (PBM). Compared with previously reported work, the proposed design occupies smaller area (in terms of number slices, number of multipliers and the BRAM usage) with a higher performance gain and less latency. By using over 5 million unique random test cases, our results show that the proposed design gives an average error of less than 0.5 ULP (unit at last place), and a maximum error of 2 ULP without using any rounding scheme. However, rounding can also be added to the design to restore some accuracy at a slight cost in area.