A Strassen-Newton algorithm for high-speed parallelizable matrix inversion
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A Systolic Architecture for Fast Dense Matrix Inversion
IEEE Transactions on Computers
Linear QR Architecture for a Single Chip Adaptive Beamformer
Journal of VLSI Signal Processing Systems - Special issue on recent advances in the design and implementation of signal processing systems
Parallel Out-of-Core Matrix Inversion
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Minimizing communication overhead for matrix inversion algorithms on hypercubes
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Efficient Matrix Inversion via Gauss-Jordan Elimination and ItsParallelization
Efficient Matrix Inversion via Gauss-Jordan Elimination and ItsParallelization
VLSI Architecture for Matrix Inversion using Modified Gram-Schmidt based QR Decomposition
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
EURASIP Journal on Applied Signal Processing
Hi-index | 0.00 |
Large-scale matrix inversion play an important role in many applications. However to the best of our knowledge, there is no FPGA-based implementation. In this paper, we explore the possibility of accelerating large-scale matrix inversion on FPGA. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for matrix inversion. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator, is proposed to implement this algorithm. A total of 12 PEs can be integrated into an Altera StratixII EP2S130F1020C5 FPGA on our self-designed board. Experimental results show that a factor of 2.6 speedup and the maximum power-performance of 41 can be achieved compare to Pentium Dual CPU with double SSE threads.