Computer Arithmetic Algorithms
Computer Arithmetic Algorithms
Architecture and CAD for Deep-Submicron FPGAs
Architecture and CAD for Deep-Submicron FPGAs
VPR: A new packing, placement and routing tool for FPGA research
FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
A CAD Suite for High-Performance FPGA Design
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
The Stratix II logic and routing architecture
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Sparse Matrix-Vector multiplication on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A low cost, multithreaded processing-in-memory system
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
An Analysis of the Double-Precision Floating-Point FFT on FPGAs
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Open Source High Performance Floating-Point Modules
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
WSEAS Transactions on Circuits and Systems
Floating-point FPGA: architecture and modeling
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfigurable custom floating-point instructions (abstract only)
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Improving FPGA performance for carry-save arithmetic
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An Automated Flow for Arithmetic Component Generation in Field-Programmable Gate Arrays
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Parallel processors architecture in FPGA for the solution of linear equations systems
ICOSSSE '09 Proceedings of the 8th WSEAS international conference on System science and simulation in engineering
Reducing the cost of floating-point mantissa alignment and normalization in FPGAs
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Real-time architecture for a robust multi-scale stereo engine on FPGA
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimizing floating point units in hybrid FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Integration, the VLSI Journal
Hi-index | 0.00 |
With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB.