FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture

Authors:
Manish Kumar Jaiswal;Nitin Chandrachoodan
Affiliations:
Indian Institute of Technology, Madras;Indian Institute of Technology, Madras
Venue:
IEEE Transactions on Computers
Year:
2012

Citing 0
Cited 4

High performance reconfigurable architecture for double precision floating point division

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

Microelectronics Journal
Scalable matrix decompositions with multiple cores on FPGAs

Microprocessors & Microsystems

Quantified Score

Hi-index	14.98

Visualization

Abstract

Decomposition of a matrix into lower and upper triangular matrices (LU decomposition) is a vital part of many scientific and engineering applications, and the block LU decomposition algorithm is an approach well suited to parallel hardware implementation. This paper presents an approach to speed up implementation of the block LU decomposition algorithm using FPGA hardware. Unlike most previous approaches reported in the literature, the approach does not assume the matrix can be stored entirely on chip. The memory accesses are studied for various FPGA configurations, and a schedule of operations for scaling well is shown. The design has been synthesized for FPGA targets and can be easily retargeted. The design outperforms previous hardware implementations, as well as tuned software implementations including the ATLAS and MKL libraries on workstations.