Parallel out-of-core computation and updating of the QR factorization

Authors:
Brian C. Gunter;Robert A. Van De Geijn
Affiliations:
The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2005

Citing 24
Cited 21

The WY representation for products of householder matrices

SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A storage-efficient WY representation for products of householder transformations

SIAM Journal on Scientific and Statistical Computing
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Fundamentals of matrix computations

Fundamentals of matrix computations
LAPACK's user's guide

LAPACK's user's guide
Scalability issues affecting the design of a dense linear algebra library

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
The torus-wrap mapping for dense matrix calculations on massively parallel computers

SIAM Journal on Scientific Computing
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Using PLAPACK: parallel linear algebra package

Using PLAPACK: parallel linear algebra package
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
PLAPACK: parallel linear algebra package design overview

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference

MPI: The Complete Reference
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
Very large electronic structure calculations using an out-of-core filter-diagonalization method

Journal of Computational Physics
Parallel Out-of-Core Cholesky and QR Factorization with POOCLAPACK

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems

PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Formal derivation of algorithms: The triangular sylvester equation

ACM Transactions on Mathematical Software (TOMS)
The Design and Implementation of the Parallel Out-of-coreScaLAPACK LU, QR, and Cholesky Factorization Routines

The Design and Implementation of the Parallel Out-of-coreScaLAPACK LU, QR, and Cholesky Factorization Routines
Efficient Parallel Out-of-Core Implementation of the Cholesky Factorization

Efficient Parallel Out-of-Core Implementation of the Cholesky Factorization
POOCLAPACK: Parallel Out-of-Core Linear Algebra Package

POOCLAPACK: Parallel Out-of-Core Linear Algebra Package
Applying recursion to serial and parallel QR factorization leads to better performance

IBM Journal of Research and Development

An FPGA-based computation model for blocked algorithms

AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Updating an LU Factorization with Pivoting

ACM Transactions on Mathematical Software (TOMS)
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
Implementing a parallel matrix factorization library on the cell broadband engine

Scientific Programming - High Performance Computing with the Cell Broadband Engine
QR factorization for the Cell Broadband Engine

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Out-of-Core Computation of the QR Factorization on Multi-core Processors

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Scaling LAPACK panel operations using parallel cache assignment

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Parallel tiled QR factorization for multicore architectures

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Scheduling two-sided transformations using tile algorithms on multicore architectures

Scientific Programming
Managing the complexity of lookahead for LU factorization with pivoting

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMP

Journal of Computational and Applied Mathematics
Using desktop computers to solve large-scale dense linear algebra problems

The Journal of Supercomputing
High-performance up-and-downdating via householder-like transformations

ACM Transactions on Mathematical Software (TOMS)
Rapid development of high-performance out-of-core solvers

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures

Concurrency and Computation: Practice & Experience
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

ACM Transactions on Mathematical Software (TOMS)
Communication-optimal Parallel and Sequential QR and LU Factorizations

SIAM Journal on Scientific Computing
Scaling LAPACK panel operations using parallel cache assignment

ACM Transactions on Mathematical Software (TOMS)
Energy-efficient execution of dense linear algebra algorithms on multi-core processors

Cluster Computing
Scalable matrix decompositions with multiple cores on FPGAs

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article discusses the high-performance parallel implementation of the computation and updating of QR factorizations of dense matrices, including problems large enough to require out-of-core computation, where the matrix is stored on disk. The algorithms presented here are scalable both in problem size and as the number of processors increases. Implementation using the Parallel Linear Algebra Package (PLAPACK) and the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK) is discussed. The methods are shown to attain excellent performance, in some cases attaining roughly 80&percent; of the “realizable” peak of the architectures on which the experiments were performed.