A BLAS-3 Version of the QR Factorization with Column Pivoting

Authors:
Gregorio Quintana-Ortí;Xiaobai Sun;Christian H. Bischof
Affiliations:
-;-;-
Venue:
SIAM Journal on Scientific Computing
Year:
1998

Citing 0
Cited 11

Computing rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
Parallel Partial Stabilizing Algorithms for Large Linear Control Systems

The Journal of Supercomputing
Efficient Algorithms for the Block Hessenberg Form

The Journal of Supercomputing
Parallel Pole Assignment of Single-Input Systems

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
State-space truncation methods for parallel model reduction of large-scale systems

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Algorithm 853: An efficient algorithm for solving rank-deficient least squares problems

ACM Transactions on Mathematical Software (TOMS)
Partial stabilisation of large-scale discrete-time linear control systems

International Journal of Computational Science and Engineering
On the Failure of Rank-Revealing QR Factorization Software -- A Case Study

ACM Transactions on Mathematical Software (TOMS)
Solving linear-quadratic optimal control problems on parallel computers

Optimization Methods & Software
Computing approximate Fekete points by QR factorizations of Vandermonde matrices

Computers & Mathematics with Applications
Parallel model reduction of large linear descriptor systems via balanced truncation

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The QR factorization with column pivoting (QRP), originally suggested by Golub [Numer. Math., 7 (1965), 206--216], is a popular approach to computing rank-revealing factorizations. Using Level 1 BLAS, it was implemented in LINPACK, and, using Level 2 BLAS, in LAPACK. While the Level 2 BLAS version delivers superior performance in general, it may result in worse performance for large matrix sizes due to cache effects. We introduce a modification of the QRP algorithm which allows the use of Level 3 BLAS kernels while maintaining the numerical behavior of the LINPACK and LAPACK implementations. Experimental comparisons of this approach with the LINPACK and LAPACK implementations on IBM RS/6000, SGI R8000, and DEC AXP platforms show considerable performance improvements.