A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable Computing Systems
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Fpga Hardware Synthesis From Matlab
VLSID '01 Proceedings of the The 14th International Conference on VLSI Design (VLSID '01)
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
High Performance Linear Algebra Operations on Reconfigurable Systems
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
An FPGA-Based Floating-Point Jacobi Iterative Solver
ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
High-Performance and Area-Efficient Reduction Circuits on FPGAs
SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Advanced Components in the Variable Precision Floating-Point Library
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Rapid Prototyping Projection Algorithms with FPGA Technology
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Reconfigurable computers (RCs) combine general-purpose processors (GPPs) with field programmable gate arrays (FPGAs). The FPGAs are reconfigured at run time to become application-specific processors that collaborate with the GPPs to execute the application. High-level language (HLL) to hardware description language (HDL) compilers allow the FPGA-based kernels to be generated using HLL-based programming rather than HDL-based hardware design. Unfortunately, the loops needed for floating-point reduction operations often cannot be pipelined by these HLL-HDL compilers. This capability gap prevents the development of a number of important FPGA-based kernels. This article describes a novel architecture and algorithm that allow the use of an HLL-HDL environment to implement high-performance FPGA-based kernels that reduce multiple, variable-length sets of floating-point data. A sparse matrix iterative solver is used to demonstrate the effectiveness of the reduction kernel. The FPGA-augmented version running on a contemporary RC is up to 2.4 times faster than the software-only version of the same solver running on the GPP. Conservative estimates show the solver will run up to 6.3 times faster than software on a next-generation RC.