Parallel QR Decomposition of a rectangular matrix
Numerische Mathematik
Complexity of parallel QR factorization
Journal of the ACM (JACM)
On Stable Parallel Linear System Solvers
Journal of the ACM (JACM)
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice & Experience
Achieving accurate and context-sensitive timing for code optimization
Software—Practice & Experience
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hierarchical QR factorization algorithms for multi-core clusters
Parallel Computing
Hi-index | 0.00 |
This work revisits existing algorithms for the QR factorization of rectangular matrices composed of p × q tiles, where p ≥ q. Within this framework, we study the critical paths and performance of algorithms such as Sameh-Kuck, Fibonacci, Greedy, and those found within PLASMA. Although neither Fibonacci nor Greedy is optimal, both are shown to be asymptotically optimal for all matrices of size p = q2f(q), where f is any function such that lim+∞ f = 0. This novel and important complexity result applies to all matrices where p and q are proportional, p = λq, with λ ≥ 1, thereby encompassing many important situations in practice (least squares). We provide an extensive set of experiments that show the superiority of the new algorithms for tall matrices.