Parallel algorithms for banded linear systems
SIAM Journal on Scientific and Statistical Computing
Interior point methods for optimal control of discrete time systems
Journal of Optimization Theory and Applications
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
High Performance Linear Algebra Operations on Reconfigurable Systems
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
An FPGA-Based Floating-Point Jacobi Iterative Solver
ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations (Software, Environments, and Tools)
MIMO Wireless Communications
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
IEEE Transactions on Parallel and Distributed Systems
Multiobjective Optimization of FPGA-Based Medical Image Registration
FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
FPGA implementation of the conjugate gradient method
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
The Krawczyk algorithm: rigorous bounds for linear equation solution on an FPGA
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Architectural support for multithreading on reconfigurable hardware
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Multithreading on reconfigurable hardware: An architectural approach
Microprocessors & Microsystems
Hi-index | 0.00 |
Recent developments in the capacity of modern Field Programmable Gate Arrays (FPGAs) have significantly expanded their applications. One such field is the acceleration of scientific computation and one type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient (CG) algorithm. In this article we present a widely parallel and deeply pipelined hardware CG implementation, targeted at modern FPGA architectures. This implementation is particularly suited for accelerating multiple small-to-medium-sized dense systems of linear equations and can be used as a stand-alone solver or as building block to solve higher-order systems. In this article it is shown that through parallelization it is possible to convert the computation time per iteration for an order n matrix from Θ(n2) clock cycles on a microprocessor to Θ(n) on a FPGA. Through deep pipelining it is also possible to solve several problems in parallel and maximize both performance and efficiency. I/O requirements are shown to be scalable and convergent to a constant value with the increase of matrix order. Post place-and-route results on a readily available VirtexII-6000 demonstrate sustained performance of 5 GFlops, and results on a Virtex5-330 indicate sustained performance of 35 GFlops. A comparison with an optimized software implementation running on a high-end CPU demonstrate that this FPGA implementation represents a significant speedup of at least an order of magnitude.