Towards a Universal FPGA Matrix-Vector Multiplication Architecture

Authors:
Srinidhi Kestur;John D. Davis;Eric S. Chung
Affiliations:
-;-;-
Venue:
FCCM '12 Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines
Year:
2012

Citing 0
Cited 2

Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

Microelectronics Journal
A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the design and implementation of a universal, single-bit stream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-optimized sparse matrix representation called Compressed Variable-Length Bit Vector (CVBV), which reduces the storage and bandwidth requirements up to 43% (on average 25%) compared to compressed sparse row (CSR) across all the matrices from the University of Florida Sparse Matrix Collection. Our hardware incorporates a runtime-programmable decoder that performs on-the-fly-decoding of various formats such as Dense, COO, CSR, DIA, and ELL. The flexibility and scalability of our design is demonstrated across two FPGA platforms: (1) the BEE3 (Virtex-5 LX155T with 16GB of DRAM) and (2) ML605 (Virtex-6 LX240T with 2GB of DRAM). For dense matrices, our approach scales to large data sets with over 1 billion elements, and achieves robust performance independent of the matrix aspect ratio. For sparse matrices, our approach using a compressed representation reduces the overall bandwidth while also achieving comparable efficiency relative to state-of-the-art approaches.