Optimising memory bandwidth use for matrix-vector multiplication in iterative methods

Authors:
David Boland;George A. Constantinides
Affiliations:
Electrical and Electronic Engineering Department, Imperial College London, London, UK;Electrical and Electronic Engineering Department, Imperial College London, London, UK
Venue:
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Year:
2010

Citing 9
Cited 3

The numerical solution of ordinary and partial differential equations

The numerical solution of ordinary and partial differential equations
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Scientific Computing

Scientific Computing
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

IEEE Transactions on Parallel and Distributed Systems
A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications

An FPGA implementation of a sparse quadratic programming solver for constrained predictive control

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
The Krawczyk algorithm: rigorous bounds for linear equation solution on an FPGA

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [1, 2, 3]. One class of algorithms to solve these systems, iterative methods, has drawn particular interest, with recent literature showing large performance improvements over general purpose processors (GPPs). In several iterative methods, this performance gain is largely a result of parallelisation of the matrixvector multiplication, an operation that occurs in many applications and hence has also been widely studied on FPGAs [4, 5]. However, whilst the performance of matrix-vector multiplication on FPGAs is generally I/O bound [4], the nature of iterative methods allows the use of onchip memory buffers to increase the bandwidth, providing the potential for significantly more parallelism [6]. Unfortunately, existing approaches have generally only either been capable of solving large matrices with limited improvement over GPPs [4,5,6], or achieve high performance for relatively small matrices [2,3]. This paper proposes hardware designs to take advantage of symmetrical and banded matrix structure, as well as methods to optimise the RAM use, in order to both increase the performance and retain this performance for larger order matrices.