When FPGAs are better at floating-point than microprocessors

Authors:
Florent de Dinechin;Jérémie Detrey;Octavian Cret;Radu Tudoran
Affiliations:
École Normale Supérieure de Lyon/Université de Lyon, Lyon, France;École Normale Supérieure de Lyon/Université de Lyon, Lyon, France;Technical University of Cluj-Napoca, Cluj-Napoca, Romania;Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Venue:
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Year:
2008

Citing 0
Cited 2

The Krawczyk algorithm: rigorous bounds for linear equation solution on an FPGA

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Profile-guided floating- to fixed-point conversion for hybrid FPGA-processor applications

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has been shown that FPGAs could outperform high-end microprocessors on floating-point computations thanks to massive parallelism. However, most previous studies re-implement in the FPGA the operators present in a processor. This conservative approach is relatively straightforward, but it doesn't exploit the greater flexibility of the FPGA. We survey the many ways in which the FPGA implementation of a given floating-point computation can be not only faster, but also more accurate than its microprocessor counterpart. Techniques studied here include custom precision, mixing and matching fixed- and floating-point, specific accumulator design, dedicated architectures for coarser operators implemented as software in processors (such as elementary functions or Euclidean norms), operator specialization such as constant multiplication, and others. The FloPoCo project (http://www.ens-lyon.fr/LIP/Arenaire/Ware/FloPoCo/) aims at providing such non-standard operators. As a conclusion, current FPGA fabrics could be enhanced to improve floating-point performance. However, these enhancements should not take the form of hard FPU blocks as others have suggested. Instead, what is needed is smaller building blocks more generally useful to the implementation of floating-point operators, such as cascadable barrel shifters and leading zero counters