A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA

Authors:
Jie Zhou;Yong Dou;Jianxun Zhao;Fei Xia;Yuanwu Lei;Yuxing Tang
Affiliations:
National Laboratory for Parallel & Distributed Processing, NUDT, Changsha, P.R. China 410073;National Laboratory for Parallel & Distributed Processing, NUDT, Changsha, P.R. China 410073;Academy of Armored Forces Engineering, Beijing, China 100072;National Laboratory for Parallel & Distributed Processing, NUDT, Changsha, P.R. China 410073;National Laboratory for Parallel & Distributed Processing, NUDT, Changsha, P.R. China 410073;National Laboratory for Parallel & Distributed Processing, NUDT, Changsha, P.R. China 410073
Venue:
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Year:
2009

Citing 8
Cited 1

A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A Systolic Architecture for Fast Dense Matrix Inversion

IEEE Transactions on Computers
Linear QR Architecture for a Single Chip Adaptive Beamformer

Journal of VLSI Signal Processing Systems - Special issue on recent advances in the design and implementation of signal processing systems
Parallel Out-of-Core Matrix Inversion

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Minimizing communication overhead for matrix inversion algorithms on hypercubes

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Efficient Matrix Inversion via Gauss-Jordan Elimination and ItsParallelization

Efficient Matrix Inversion via Gauss-Jordan Elimination and ItsParallelization
VLSI Architecture for Matrix Inversion using Modified Gram-Schmidt based QR Decomposition

VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
A new pipelined systolic array-based architecture for matrix inversion in FPGAS with Kalman filter case study

EURASIP Journal on Applied Signal Processing

FPGA implementation of Kalman filter for neural ensemble decoding of rat's motor cortex

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale matrix inversion play an important role in many applications. However to the best of our knowledge, there is no FPGA-based implementation. In this paper, we explore the possibility of accelerating large-scale matrix inversion on FPGA. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for matrix inversion. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator, is proposed to implement this algorithm. A total of 12 PEs can be integrated into an Altera StratixII EP2S130F1020C5 FPGA on our self-designed board. Experimental results show that a factor of 2.6 speedup and the maximum power-performance of 41 can be achieved compare to Pentium Dual CPU with double SSE threads.