Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
High Performance Linear Algebra Operations on Reconfigurable Systems
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Architectures and APIs: assessing requirements for delivering FPGA performance to applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Examining the viability of FPGA supercomputing
EURASIP Journal on Embedded Systems
Architecture for dense matrix multiplication on a high-performance reconfigurable system
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Parallel FPGA-based all-pairs shortest-paths in a directed graph
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Numerical linear algebra operations are key primitives in scientific computing. Performance optimizations of such operations have been extensively investigated and some basic operations have been implemented as software libraries. With the rapid advances in technology, hardware acceleration of linear algebra applications using FPGAs (Field Programmable-Gate Arrays) has become feasible. In this paper, we propose FPGA-based designs for several BLAS operations, including vector product, matrix-vector multiply, and matrix multiply. By identifying the design parameters for each BLAS operation, we analyze the design tradeoffs. In the implementations of the designs, the values of the design parameters are determined according to the hardware constraints, such as the available area, the size of on-chip memory, the external memory bandwidth and the number of I/O pins. The proposed designs are implemented on a Xilinx Virtex-II Pro FPGA.