FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Pipelining of double precision floating point division and square root operations
Proceedings of the 44th annual Southeast regional conference
C is for circuits: capturing FPGA circuits as sequential code for portability
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
OpenFPGA CoreLib core library interoperability effort
Parallel Computing
Higher radix and redundancy factor for floating point SRT division
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Floating-point divider design for FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Parameterizable floating-point library for arithmetic operations in FPGAs
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Journal of Signal Processing Systems
Journal of Signal Processing Systems
High performance reconfigurable architecture for double precision floating point division
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Hi-index | 0.00 |
Low latency, high throughput and small area arethree major design considerations of an FPGA design.In this paper, we present a high radix SRT divisionalgorithm and a binary restoring square root algorithm.We describe three implementations of floating-pointdivision operations with variable width and precisionbased on Virtex-2 FPGAs.One is a low costiterative implementation; another is a low latency arrayimplementation; and the third is a high throughputpipelined implementation.The implementationsof floating-point square root operations are presentedas well.In addition to presenting the design of thesemodules, we analyze the tradeoffs among cost, latencyand throughput with strategies on how to reduce thecost, or improve the performance.