The WY representation for products of householder matrices
SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A storage-efficient WY representation for products of householder transformations
SIAM Journal on Scientific and Statistical Computing
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Fundamentals of matrix computations
Fundamentals of matrix computations
LAPACK's user's guide
Scalability issues affecting the design of a dense linear algebra library
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
The torus-wrap mapping for dense matrix calculations on massively parallel computers
SIAM Journal on Scientific Computing
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Matrix computations (3rd ed.)
Using PLAPACK: parallel linear algebra package
Using PLAPACK: parallel linear algebra package
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
PLAPACK: parallel linear algebra package design overview
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
Very large electronic structure calculations using an out-of-core filter-diagonalization method
Journal of Computational Physics
Parallel Out-of-Core Cholesky and QR Factorization with POOCLAPACK
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Formal derivation of algorithms: The triangular sylvester equation
ACM Transactions on Mathematical Software (TOMS)
The Design and Implementation of the Parallel Out-of-coreScaLAPACK LU, QR, and Cholesky Factorization Routines
Efficient Parallel Out-of-Core Implementation of the Cholesky Factorization
Efficient Parallel Out-of-Core Implementation of the Cholesky Factorization
POOCLAPACK: Parallel Out-of-Core Linear Algebra Package
POOCLAPACK: Parallel Out-of-Core Linear Algebra Package
Applying recursion to serial and parallel QR factorization leads to better performance
IBM Journal of Research and Development
An FPGA-based computation model for blocked algorithms
AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Updating an LU Factorization with Pivoting
ACM Transactions on Mathematical Software (TOMS)
Implementing a parallel matrix factorization library on the cell broadband engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
QR factorization for the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Out-of-Core Computation of the QR Factorization on Multi-core Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Scaling LAPACK panel operations using parallel cache assignment
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Parallel tiled QR factorization for multicore architectures
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Scheduling two-sided transformations using tile algorithms on multicore architectures
Scientific Programming
Managing the complexity of lookahead for LU factorization with pivoting
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Journal of Computational and Applied Mathematics
Using desktop computers to solve large-scale dense linear algebra problems
The Journal of Supercomputing
High-performance up-and-downdating via householder-like transformations
ACM Transactions on Mathematical Software (TOMS)
Rapid development of high-performance out-of-core solvers
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Concurrency and Computation: Practice & Experience
ACM Transactions on Mathematical Software (TOMS)
Communication-optimal Parallel and Sequential QR and LU Factorizations
SIAM Journal on Scientific Computing
Scaling LAPACK panel operations using parallel cache assignment
ACM Transactions on Mathematical Software (TOMS)
Scalable matrix decompositions with multiple cores on FPGAs
Microprocessors & Microsystems
Hi-index | 0.00 |
This article discusses the high-performance parallel implementation of the computation and updating of QR factorizations of dense matrices, including problems large enough to require out-of-core computation, where the matrix is stored on disk. The algorithms presented here are scalable both in problem size and as the number of processors increases. Implementation using the Parallel Linear Algebra Package (PLAPACK) and the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK) is discussed. The methods are shown to attain excellent performance, in some cases attaining roughly 80&percent; of the “realizable” peak of the architectures on which the experiments were performed.