A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

  • Authors:
  • Antonio Roldao Lopes;George A. Constantinides

  • Affiliations:
  • Electrical & Electronic Engineering, Imperial College London, London, England SW7 2BT;Electrical & Electronic Engineering, Imperial College London, London, England SW7 2BT

  • Venue:
  • ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applications. One type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient algorithm. In this paper we present a parallel hardware Conjugate Gradient implementation. The implementation is particularly suited for accelerating multiple small to medium sized dense systems of linear equations. Through parallelization it is possible to convert the computation time per iteration for an order nmatrix from 茂戮驴(n2) cycles for a software implementation to 茂戮驴(n). I/O requirements are scalable and converge to a constant value with the increase of matrix order. Results on a VirtexII-6000 demonstrate sustained performance of 5 GFLOPS and projected results on a Virtex5-330 indicate sustained performance of 35 GFLOPS. The former result is comparable to high-end CPUs, whereas the latter represents a significant speedup.