A fused hybrid floating-point and fixed-point dot-product for FPGAs

Authors:
Antonio Roldao Lopes;George A. Constantinides
Affiliations:
Electrical S Electronic Engineering, Imperial College London, London, England;Electrical S Electronic Engineering, Imperial College London, London, England
Venue:
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Year:
2010

Citing 3
Cited 4

Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

IEEE Transactions on Parallel and Distributed Systems
A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications

Peak performance model for a custom precision floating-point dot product on FPGAs

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A scalable approach for automated precision analysis

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
FPGA paranoia: testing numerical properties of FPGA floating point IP-Cores

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dot-products are one of the essential and recurrent building blocks in scientific computing, and often take-up a large proportion of the scientific acceleration circuitry. The acceleration of dot-products is very well suited for Field Programmable Gate Arrays (FPGAs) since these devices can be configured to employ wide parallelism, deep pipelining and exploit highly efficient datapaths. In this paper we present a dot-product implementation which operates using a hybrid floating-point and fixed-point number system. This design receives floating-point inputs, and generates a floating-point output. Internally it makes use of a configurable word-length fixed-point number system. The internal representation can be tuned to match the desired accuracy. Results using a high-end Xilinx FPGA and an order 150 dot-product demonstrate that, for equivalent accuracy metrics, it is possible to utilize 3.8 times fewer resources, operate at 1.62 times faster clock frequency, and achieve a significant reduction in latency when compared to a direct floating-point core based dot-product. Combining these results and utilizing the spare resources to instantiate more units in parallel, it is possible to achieve an overall speed-up of at least 5 times.