Computing rank-revealing QR factorizations of dense matrices
ACM Transactions on Mathematical Software (TOMS)
Parallel Partial Stabilizing Algorithms for Large Linear Control Systems
The Journal of Supercomputing
Efficient Algorithms for the Block Hessenberg Form
The Journal of Supercomputing
Parallel Pole Assignment of Single-Input Systems
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
State-space truncation methods for parallel model reduction of large-scale systems
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Algorithm 853: An efficient algorithm for solving rank-deficient least squares problems
ACM Transactions on Mathematical Software (TOMS)
Partial stabilisation of large-scale discrete-time linear control systems
International Journal of Computational Science and Engineering
On the Failure of Rank-Revealing QR Factorization Software -- A Case Study
ACM Transactions on Mathematical Software (TOMS)
Solving linear-quadratic optimal control problems on parallel computers
Optimization Methods & Software
Computing approximate Fekete points by QR factorizations of Vandermonde matrices
Computers & Mathematics with Applications
Parallel model reduction of large linear descriptor systems via balanced truncation
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Hi-index | 0.00 |
The QR factorization with column pivoting (QRP), originally suggested by Golub [Numer. Math., 7 (1965), 206--216], is a popular approach to computing rank-revealing factorizations. Using Level 1 BLAS, it was implemented in LINPACK, and, using Level 2 BLAS, in LAPACK. While the Level 2 BLAS version delivers superior performance in general, it may result in worse performance for large matrix sizes due to cache effects. We introduce a modification of the QRP algorithm which allows the use of Level 3 BLAS kernels while maintaining the numerical behavior of the LINPACK and LAPACK implementations. Experimental comparisons of this approach with the LINPACK and LAPACK implementations on IBM RS/6000, SGI R8000, and DEC AXP platforms show considerable performance improvements.