Challenges in CAD for the one million gate FPGA
FPGA '97 Proceedings of the 1997 ACM fifth international symposium on Field-programmable gate arrays
Implementation of single precision floating point square root on FPGAs
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Tradeoffs of Designing Floating-Point Division and Square Root on Virtex FPGAs
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Modular array structure for non-restoring square root circuit
Journal of Systems Architecture: the EUROMICRO Journal
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Journal of Signal Processing Systems
High performance reconfigurable architecture for double precision floating point division
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Hi-index | 0.00 |
Space applications rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be compiled on DSP processors, high data rate DSP computations require novel implementation technologies to support their high throughputs. Only recently, gate densities in FPGAs have reached a level which makes them attractive platforms to implement compute-intensive DSP applications. In this context, this paper presents the sequential and pipelined designs of a double precision floating point divider and square root unit on FPGAs. Contrary to pipelined parallel implementations, the pipelining of these units is based on unrolling the iterations in low-radix digit recurrence algorithms. These units are mapped on generic FPGA reconfigurable fabric without taking advantage of any advanced architectural components available in high capacity FPGAs. The implementations of these designs show that their performances are comparable to, and sometimes higher than, the performances of non-iterative designs based of high radix numbers. The iterative divider and square root unit occupy less than 1% of an XC2V6000 FPGA chip while their pipelined counterparts can produce throughputs that reach the 100 MFLOPS mark by consuming a modest 8% of the chip area.