A high throughput FPGA-Based implementation of the lanczos method for the symmetric extremal eigenvalue problem

Authors:
Abid Rafique;Nachiket Kapre;George A. Constantinides
Affiliations:
Electrical and Electronic Engineering, Imperial College London, London, UK;Electrical and Electronic Engineering, Imperial College London, London, UK;Electrical and Electronic Engineering, Imperial College London, London, UK
Venue:
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Year:
2012

Citing 7
Cited 0

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Applied numerical linear algebra

Applied numerical linear algebra
A Note on the Calculation of Step-Lengths in Interior-Point Methods for Semidefinite Programming

Computational Optimization and Applications
Stream Computations Organized for Reconfigurable Execution (SCORE)

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
FPGAs vs. CPUs: trends in peak floating-point performance

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Hardware efficient architectures for eigenvalue computation

Proceedings of the conference on Design, automation and test in Europe: Proceedings
A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ˜ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with high-bandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanczos method can be specialized only for extremal eigenvalues computation and present an architecture which can achieve a sustained single precision floating-point performance of 175 GFLOPs on Virtex6-SX475T for a dense matrix of size 335×335. We perform a quantitative comparison with the parallel implementations of the Lanczos method using optimized Intel MKL and CUBLAS libraries for multi-core and GPU respectively. We find that for a range of matrices the FPGA implementation outperforms both multi-core and GPU; a speed up of 8.2-27.3× (13.4× geo. mean) over an Intel Xeon X5650 and 26.2-116× (52.8× geo. mean) over an Nvidia C2050 when FPGA is solving a single eigenvalue problem whereas a speed up of 41-520× (103× geo.mean) and 131-2220× (408× geo.mean) respectively when it is solving multiple eigenvalue problems.