Architectural modifications to enhance the floating-point performance of FPGAs

Authors:
Michael J. Beauchamp;Scott Hauck;Keith D. Underwood;K. Scott Hemmert
Affiliations:
MIPS Technologies, Mountain View, CA;Department of Electrical Engineering, University of Washington, Seattle, WA;Scalable Computing Systems, Sandia National Laboratories, Albuquerque, NM;Scalable Computing Systems, Sandia National Laboratories, Albuqterque, NM
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2008

Citing 13
Cited 10

Computer Arithmetic Algorithms

Computer Arithmetic Algorithms
Architecture and CAD for Deep-Submicron FPGAs

Architecture and CAD for Deep-Submicron FPGAs
VPR: A new packing, placement and routing tool for FPGA research

FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
A CAD Suite for High-Performance FPGA Design

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FPGAs vs. CPUs: trends in peak floating-point performance

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Using bus-based connections to improve field-programmable gate array density for implementing datapath circuits

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
The Stratix II logic and routing architecture

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A low cost, multithreaded processing-in-memory system

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
An Analysis of the Double-Precision Floating-Point FFT on FPGAs

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Open Source High Performance Floating-Point Modules

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

Parallel architecture for the solution of linear equations systems based on division free Gaussian elimination method implemented in FPGA

WSEAS Transactions on Circuits and Systems
Floating-point FPGA: architecture and modeling

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfigurable custom floating-point instructions (abstract only)

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Improving FPGA performance for carry-save arithmetic

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An Automated Flow for Arithmetic Component Generation in Field-Programmable Gate Arrays

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Parallel processors architecture in FPGA for the solution of linear equations systems

ICOSSSE '09 Proceedings of the 8th WSEAS international conference on System science and simulation in engineering
Reducing the cost of floating-point mantissa alignment and normalization in FPGAs

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Real-time architecture for a robust multi-scale stereo engine on FPGA

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimizing floating point units in hybrid FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design of a coarse-grained reconfigurable architecture with floating-point support and comparative study

Integration, the VLSI Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB.