Peak performance model for a custom precision floating-point dot product on FPGAs

Authors:
Manfred Mücke;Bernd Lesser;Wilfried N. Gansterer
Affiliations:
University of Vienna, Research Lab Computational Technologies and Applications;University of Vienna, Research Lab Computational Technologies and Applications;University of Vienna, Research Lab Computational Technologies and Applications
Venue:
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Year:
2010

Citing 3
Cited 1

High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware

IEEE Transactions on Computers
BLAS Comparison on FPGA, CPU and GPU

ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
A fused hybrid floating-point and fixed-point dot-product for FPGAs

ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications

Modelling reconfigurable systems in event driven simulation

ACM SIGARCH Computer Architecture News - ACM SIGARCH Computer Architecture News/HEART '12

Quantified Score

Hi-index	0.00

Visualization

Abstract

FPGAs have the native feature that reduced resource usage of single operators can be directly translated in additional parallelism. For floating-point (FP) operators, such reduced resource usage can be achieved by reducing the mantissa bit width. The work presented here pursues two objectives: First, the maximum number of operands of a parallel dot product architecture is explored experimentally on an FPGA for different custom precision FP number formats. Given the resources of this FPGA, it is shown that based on non-pipelined basic FP operators, a dot product for input vector size 21, 57 and 123 can be implemented for double-, single- and half-precision, respectively. This corresponds to a respective peak performance of 1, 3.2 and 9.9 GFlop/s. Second, it is shown that the maximum dot product peak performance as a function of used precision can be modeled by a function of the form P(p) = c1 + c2 ċ pc3, given a certain type of FPGA, library and synthesis settings. Fitting experimental data to this model reveals similarities as well as differences among generations of devices.