Fast, Efficient Floating-Point Adders and Multipliers for FPGAs

Authors:
K. Scott Hemmert;Keith D. Underwood
Affiliations:
Sandia National Laboratories;Intel Corporation
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2010

Citing 21
Cited 2

Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques

IEEE Transactions on Computers
A Comparison of Three Rounding Algorithms for IEEE Floating-Point Multiplication

IEEE Transactions on Computers - Special issue on computer arithmetic
A flexible floating-point format for optimizing data-paths and operators in FPGA based DSPs

FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automating Customisation of Floating-Point Designs

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Library of Parameterized Floating-Point Modules and Their Use

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Re-evaluation of the Practicality of Floating-Point Operations on FPGAs

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A CAD Suite for High-Performance FPGA Design

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
1-GHz HAL SPARC64® Dual Floating Point Unit with RAS Features

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
On the Design of Fast IEEE Floating-Point Adders

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Quantitative analysis of floating point arithmetic on FPGA based custom computing machines

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Tradeoffs of Designing Floating-Point Division and Square Root on Virtex FPGAs

FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Floating Point Unit Generation and Evaluation for FPGAs

FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FPGAs vs. CPUs: trends in peak floating-point performance

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
64-bit floating-point FPGA matrix multiplication

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
FPU Implementations with Denormalized Numbers

IEEE Transactions on Computers
An Analysis of the Double-Precision Floating-Point FFT on FPGAs

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Floating-point multiply-add-fused with reduced latency

IEEE Transactions on Computers

Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

Microelectronics Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Floating-point applications are a growing trend in the FPGA community. As such, it has become critical to create floating-point units optimized for standard FPGA technology. Unfortunately, the FPGA design space is very different from the VLSI design space; thus, optimizations for FPGAs can differ significantly from optimizations for VLSI. In particular, the FPGA environment constrains the design space such that only limited parallelism can be effectively exploited to reduce latency. Obtaining the right balances between clock speed, latency, and area in FPGAs can be particularly challenging. This article presents implementation details for an IEEE-754 standard floating-point adder and multiplier for FPGAs. The designs presented here enable a Xilinx Virtex4 FPGA (-11 speed grade) to achieve 270 MHz IEEE compliant double precision floating-point performance with a 9-stage adder pipeline and 14-stage multiplier pipeline. The area requirement is approximately 500 slices for the adder and under 750 slices for the multiplier.