64-bit floating-point FPGA matrix multiplication

  • Authors:
  • Yong Dou;S. Vassiliadis;G. K. Kuzmanov;G. N. Gaydadjiev

  • Affiliations:
  • National Laboratory for Parallel and Distributed Processing, Changsha, P.R. China;EEMCS, TU Delft, Delft, The Netherlands;EEMCS, TU Delft, Delft, The Netherlands;EEMCS, TU Delft, Delft, The Netherlands

  • Venue:
  • Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a 64-bit ANSI/IEEE Std 754-1985 floating point design of a hardware matrix multiplier optimized for FPGA implementations. A general block matrix multiplication algorithm, applicable for an arbitrary matrix size is proposed. The algorithm potentially enables optimum performance by exploiting the data locality and reusability incurred by the general matrix multiplication scheme and considering the limitations of the I/O bandwidth and the local storage volume. We implement a scalable linear array of processing elements (PE) supporting the proposed algorithm in the Xilinx Virtex II Pro technology. Synthesis results confirm a superior performance-area ratio compared to related recent works. Assuming the same FPGA chip, the same amount of local memory, and the same I/O bandwidth, our design outperforms related proposals by at least 1.7X and up to 18X consuming the least reconfigurable resources. A total of 39 PEs can be integrated into the xc2vp125-7 FPGA, reaching performance of, e.g., 15.6 GFLOPS with 1600 KB local memory and 400 MB/s external memory bandwidth.