Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems

Authors:
Ling Zhuo;Viktor K. Prasanna
Affiliations:
University of Southern California, USA;University of Southern California, USA
Venue:
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Year:
2006

Citing 9
Cited 2

ScaLAPACK user's guide

ScaLAPACK user's guide
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
64-bit floating-point FPGA matrix multiplication

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming

Assessing the potential of hybrid hpc systems for scientific applications: a case study

Proceedings of the 4th international conference on Computing frontiers
Architecture for dense matrix multiplication on a high-performance reconfigurable system

Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, reconfigurable computing systems have been built which employ Field-Programmable Gate Arrays (FPGAs) as hardware accelerators for general-purpose processors. These systems provide new opportunities for highperformance computing. In this paper, we investigate hybrid designs that effectively utilize both the FPGAs and processors in the reconfigurable computing systems. Based on a high-level computational model, we propose designs for floating-point matrix multiplication and block LU decomposition. In our designs, the workload of an application is partitioned between the FPGAs and processors in a balanced way; the FPGAs and processors work cooperatively without data hazards or memory access conflicts. Experimental results on Cray XD1 show that with one Xilinx XC2VP50 FPGA (a relatively small device available in XD1) and an AMD 2.2 GHz processor, our designs achieve up to 1.4X/2X speedup over the design that employs AMD processors/FPGAs only. The performance of our designs scales with the number of nodes. Moreover, our designs achieve higher performance when improved floating-point units or larger devices are used.