QR factorization for shared memory and message passing

Authors:
Ian N. Dunn;Gerard G. L. Meyer
Affiliations:
Mercury Computer Systems;Department of Electrical and Computer Engineering, Johns Hopkins University, Barton 105, 3400 North Charles Street, Baltimore, MD
Venue:
Parallel Computing
Year:
2002

Citing 15
Cited 0

Can dataflow subsume von Neumann computing?

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
ScaLAPACK user's guide

ScaLAPACK user's guide
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs

International Journal of Parallel Programming
Maximizing parallelism and minimizing synchronization with affine partitions

Parallel Computing - Special issues on languages and compilers for parallel computers
Algorithmic Redistribution Methods for Block-Cyclic Decompositions

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

IEEE Transactions on Parallel and Distributed Systems
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers

The Journal of Supercomputing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Grain Size Determination for Parallel Processing

IEEE Software
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
A Block or Factorization Scheme for Loosely Coupled Systems of Array Processors

A Block or Factorization Scheme for Loosely Coupled Systems of Array Processors
LAPACK Working Note 80: The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

LAPACK Working Note 80: The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the design, implementation, and performance of three new parallel QR factorization algorithms: shared memory, synchronous message passing, and asynchronous message passing. In contrast to existing parallel algorithms, the multiprocessor partitioning strategy is not governed by an underlying static data distribution scheme. Rather, a dynamic distribution strategy is employed to improve scalability on small problems. Experiments conducted on a 128-processor SGI Origin 2000 and a 64-processor HP SPP-2000 show that the new algorithms have a lower execution time than available tuned parallel routines installed on the machines including a version of ScaLAPACK's distributed QR factorization algorithm PDGEQRF.