A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

Authors:
Antonio Roldao Lopes;George A. Constantinides
Affiliations:
Electrical & Electronic Engineering, Imperial College London, London, England SW7 2BT;Electrical & Electronic Engineering, Imperial College London, London, England SW7 2BT
Venue:
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Year:
2008

Citing 8
Cited 6

Parallel algorithms for banded linear systems

SIAM Journal on Scientific and Statistical Computing
Interior point methods for optimal control of discrete time systems

Journal of Optimization Theory and Applications
FPGAs vs. CPUs: trends in peak floating-point performance

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
An FPGA-Based Floating-Point Jacobi Iterative Solver

ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations (Software, Environments, and Tools)

The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations (Software, Environments, and Tools)
MIMO Wireless Communications

MIMO Wireless Communications
FPGA implementation of the conjugate gradient method

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics

An FPGA implementation of a sparse quadratic programming solver for constrained predictive control

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A fused hybrid floating-point and fixed-point dot-product for FPGAs

ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Optimising memory bandwidth use for matrix-vector multiplication in iterative methods

ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Portable and scalable FPGA-based acceleration of a direct linear system solver

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A high throughput FPGA-Based implementation of the lanczos method for the symmetric extremal eigenvalue problem

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applications. One type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient algorithm. In this paper we present a parallel hardware Conjugate Gradient implementation. The implementation is particularly suited for accelerating multiple small to medium sized dense systems of linear equations. Through parallelization it is possible to convert the computation time per iteration for an order nmatrix from 茂戮驴(n2) cycles for a software implementation to 茂戮驴(n). I/O requirements are scalable and converge to a constant value with the increase of matrix order. Results on a VirtexII-6000 demonstrate sustained performance of 5 GFLOPS and projected results on a Virtex5-330 indicate sustained performance of 35 GFLOPS. The former result is comparable to high-end CPUs, whereas the latter represents a significant speedup.