IEEE Transactions on Computers
A Comparison of Three Rounding Algorithms for IEEE Floating-Point Multiplication
IEEE Transactions on Computers - Special issue on computer arithmetic
A flexible floating-point format for optimizing data-paths and operators in FPGA based DSPs
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automating Customisation of Floating-Point Designs
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Library of Parameterized Floating-Point Modules and Their Use
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Re-evaluation of the Practicality of Floating-Point Operations on FPGAs
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A CAD Suite for High-Performance FPGA Design
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
1-GHz HAL SPARC64® Dual Floating Point Unit with RAS Features
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
On the Design of Fast IEEE Floating-Point Adders
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Quantitative analysis of floating point arithmetic on FPGA based custom computing machines
FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Tradeoffs of Designing Floating-Point Division and Square Root on Virtex FPGAs
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Floating Point Unit Generation and Evaluation for FPGAs
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix-Vector multiplication on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
FPU Implementations with Denormalized Numbers
IEEE Transactions on Computers
An Analysis of the Double-Precision Floating-Point FFT on FPGAs
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Floating-point multiply-add-fused with reduced latency
IEEE Transactions on Computers
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
Floating-point applications are a growing trend in the FPGA community. As such, it has become critical to create floating-point units optimized for standard FPGA technology. Unfortunately, the FPGA design space is very different from the VLSI design space; thus, optimizations for FPGAs can differ significantly from optimizations for VLSI. In particular, the FPGA environment constrains the design space such that only limited parallelism can be effectively exploited to reduce latency. Obtaining the right balances between clock speed, latency, and area in FPGAs can be particularly challenging. This article presents implementation details for an IEEE-754 standard floating-point adder and multiplier for FPGAs. The designs presented here enable a Xilinx Virtex4 FPGA (-11 speed grade) to achieve 270 MHz IEEE compliant double precision floating-point performance with a 9-stage adder pipeline and 14-stage multiplier pipeline. The area requirement is approximately 500 slices for the adder and under 750 slices for the multiplier.