High Performance Linear Algebra Operations on Reconfigurable Systems
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.01 |
Within the parallel computing domain, field programmable gate arrays (FPGA) are no longer restricted to their traditional role as substitutes for application-specific integrated circuits as hardware "hidden" from the end user. Several high performance computing vendors offer parallel reconfigurable computers employing user-programmable FPGAs. These exciting new architectures allow end-users to, in effect, create reconfigurable coprocessors targeting the computationally intensive parts of each problem. The increased capability of contemporary FPGAs coupled with the embarrassingly parallel nature of the Jacobi iterative method make the Jacobi method an ideal candidate for hardware acceleration. This paper introduces a parameterized design for a deeply pipelined, highly parallelized IEEE 64-bit floating-point version of the Jacobi method. A Jacobi circuit is implemented using a Xilinx Virtex-II Pro as the target FPGA device. Implementation statistics and performance estimates are presented.