A high throughput FPGA-Based implementation of the lanczos method for the symmetric extremal eigenvalue problem

  • Authors:
  • Abid Rafique;Nachiket Kapre;George A. Constantinides

  • Affiliations:
  • Electrical and Electronic Engineering, Imperial College London, London, UK;Electrical and Electronic Engineering, Imperial College London, London, UK;Electrical and Electronic Engineering, Imperial College London, London, UK

  • Venue:
  • ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ˜ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with high-bandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanczos method can be specialized only for extremal eigenvalues computation and present an architecture which can achieve a sustained single precision floating-point performance of 175 GFLOPs on Virtex6-SX475T for a dense matrix of size 335×335. We perform a quantitative comparison with the parallel implementations of the Lanczos method using optimized Intel MKL and CUBLAS libraries for multi-core and GPU respectively. We find that for a range of matrices the FPGA implementation outperforms both multi-core and GPU; a speed up of 8.2-27.3× (13.4× geo. mean) over an Intel Xeon X5650 and 26.2-116× (52.8× geo. mean) over an Nvidia C2050 when FPGA is solving a single eigenvalue problem whereas a speed up of 41-520× (103× geo.mean) and 131-2220× (408× geo.mean) respectively when it is solving multiple eigenvalue problems.