Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application

Authors:
David Dubois;Andrew Dubois;Thomas Boorman;Carolyn Connor;Steve Poole
Affiliations:
Los Alamos National Laboratory;Los Alamos National Laboratory;Los Alamos National Laboratory;Los Alamos National Laboratory;Oak Ridge National Laboratory
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2010

Citing 12
Cited 0

Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
A quantitative analysis of the speedup factors of FPGAs over processors

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
FPGAs vs. CPUs: trends in peak floating-point performance

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Reflections on the memory wall

Proceedings of the 1st conference on Computing frontiers
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix-Vector Multiplication Design on FPGAs

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer

FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
An Implementation of the Conjugate Gradient Algorithm on FPGAs

FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
FPGA implementation of the conjugate gradient method

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Double precision floating point Sparse Matrix-Vector Multiplication (SMVM) is a critical computational kernel used in iterative solvers for systems of sparse linear equations. The poor data locality exhibited by sparse matrices along with the high memory bandwidth requirements of SMVM result in poor performance on general purpose processors. Field Programmable Gate Arrays (FPGAs) offer a possible alternative with their customizable and application-targeted memory sub-system and processing elements. In this work we investigate two separate implementations of the SMVM on an SRC-6 MAPStation workstation. The first implementation investigates the peak performance capability, while the second implementation balances the amount of instantiated logic with the available sustained bandwidth of the FPGA subsystem. Both implementations yield the same sustained performance with the second producing a much more efficient solution. The metrics of processor and application balance are introduced to help provide some insight into the efficiencies of the FPGA and CPU based solutions explicitly showing the tight coupling of the available bandwidth to peak floating point performance. Due to the FPGAs ability to balance the amount of implemented logic to the available memory bandwidth it can provide a much more efficient solution. Finally, making use of the lessons learned implementing the SMVM, we present a fully implemented non-preconditioned Conjugate Gradient Algorithm utilizing the second SMVM design.