Sparse Matrix-Vector multiplication on FPGAs

Authors:
Ling Zhuo;Viktor K. Prasanna
Affiliations:
University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA
Venue:
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Year:
2005

Citing 10
Cited 29

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Templates for the solution of algebraic eigenvalue problems: a practical guide

Templates for the solution of algebraic eigenvalue problems: a practical guide
On Sparse Matrix-Vector Multiplication with FPGA-Based System

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
High Level Programming for FPGA Based Image and Video Processing Using Hardware Skeletons

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
High-Performance FPGA-Based General Reduction Methods

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Embedded floating-point units in FPGAs

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Architectures and APIs: assessing requirements for delivering FPGA performance to applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems

IEEE Transactions on Parallel and Distributed Systems
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

IEEE Transactions on Parallel and Distributed Systems
Multivariate Gaussian Random Number Generation Targeting Reconfigurable Hardware

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Area-efficient arithmetic expression evaluation using deeply pipelined floating-point cores

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Architectural modifications to enhance the floating-point performance of FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Computation reuse in domain-specific optimization of signal recognition

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Floating-point divider design for FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
An integrated reduction technique for a double precision accumulator

Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
3-D brain MRI tissue classification on FPGAs

IEEE Transactions on Image Processing
Haptic rendering of deformable objects using a multiple FPGA parallel computing architecture

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations Based on Finite Difference Methods

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Layout aware optimization of high speed fixed coefficient FIR filters for FPGAs

International Journal of Reconfigurable Computing
Domain-Specific Optimization of Signal Recognition Targeting FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Optimising memory bandwidth use for matrix-vector multiplication in iterative methods

ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Portable and scalable FPGA-based acceleration of a direct linear system solver

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Improving performance of codes with large/irregular stride memory access patterns via high performance reconfigurable computers

Journal of Parallel and Distributed Computing
A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Compiled multithreaded data paths on FPGAs for dynamic workloads

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientific and engineering applications. The poor data locality of sparse matrices significantly reduces the performance of SpMXV on general-purpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of SpMXV. In this paper, we propose an FPGA-based design for SpMXV. Our design accepts sparse matrices in Compressed Row Storage format, and makes no assumptions about the sparsity structure of the input matrix. The design employs IEEE-754 format double-precision floating-point multipliers/adders, and performs multiple floating-point operations as well as I/O operations in parallel. The performance of our design for SpMXV is evaluated using various sparse matrices from the scientific computing community, with the Xilinx Virtex-II Pro XC2VP70 as the target device. The MFLOPS performance increases with the hardware resources on the device as well as the available memory bandwidth. For example, when the memory bandwidth is 8 GB/s, our design achieves over 350 MFLOPS for all the test matrices. It demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure. Besides solving SpMXV problem, our design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs.