A parallel triangular solver for distributed-memory multiprocessor
SIAM Journal on Scientific and Statistical Computing
LAPACK: a portable linear algebra library for high-performance computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Block-cyclic dense linear algebra
SIAM Journal on Scientific Computing
IBM Journal of Research and Development
ScaLAPACK user's guide
LAPACK Working Note 96: Scalable Universal Matrix Multiplication Algorithm
LAPACK Working Note 96: Scalable Universal Matrix Multiplication Algorithm
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Enhancement of a WLAN-based internet service in Korea
Proceedings of the 1st ACM international workshop on Wireless mobile applications and services on WLAN hotspots
Adaptive transmission opportunity with admission control for IEEE 802.11e networks
MSWiM '05 Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems
Hi-index | 0.00 |
Matrix factorization algorithms such as LU, QR, and Cholesky, are the most widely used methods for solving dense linear systems of equations, and have been extensively studied and implemented on vector and parallel computers. In this paper, we present parallel LU, QR, and Cholesky factorization routines with an "algorithmic blocking" on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irrespective of the physical block size. The routines are implemented on the SGI/Cray T3E and compared with the corresponding ScaLAPACK factorization routines.