Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures

Authors:
Cong Fu;Xiangmin Jiao;Tao Yang
Affiliations:
Siemens Pyramid Information Systems, San Jose, CA;Univ. of Illinois at Urbana-Champaign, Urbana, Il;Univ. of California, Santa Barbara
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 25
Cited 5

Computational models and task scheduling for parallel sparse Cholesky factorization

Parallel Computing
Symbolic factorization for sparse Gaussian elimination with partial pivoting

SIAM Journal on Scientific and Statistical Computing
Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
The influence of relaxed supernode partitions on the multifrontal method

ACM Transactions on Mathematical Software (TOMS)
Parallel sparse Gaussian elimination with partial pivoting

Annals of Operations Research
Parallel algorithms for sparse linear systems

SIAM Review
PYRROS: static task scheduling and code generation for message passing multiprocessors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Scientific computing: an introduction with parallel computing

Scientific computing: an introduction with parallel computing
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization

Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
The parallel solution of nonsymmetric sparse linear systems using the H* reordering and an associated factorization

ICS '94 Proceedings of the 8th international conference on Supercomputing
Distributed sparse Gaussian elimination and orthogonal factorization

SIAM Journal on Scientific Computing
Decoupling synchronization and data transfer in message passing systems of parallel computers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Parallel Sparse Orthogonal Factorization on Distributed-Memory Multiprocessors

SIAM Journal on Scientific Computing
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Run-time compilation for parallel sparse matrix computations

ICS '96 Proceedings of the 10th international conference on Supercomputing
An Unsymmetric-Pattern Multifrontal Method for Sparse LU Factorization

SIAM Journal on Matrix Analysis and Applications
Space and time efficient execution of parallel irregular computations

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
On Algorithms for Obtaining a Maximum Transversal

ACM Transactions on Mathematical Software (TOMS)
The Multifrontal Solution of Indefinite Sparse Symmetric Linear

ACM Transactions on Mathematical Software (TOMS)
Sparse LU factorization with partial pivoting on distributed memory machines

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Improved load distribution in parallel sparse cholesky factorization

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
On the Granularity and Clustering of Directed Acyclic Task Graphs

IEEE Transactions on Parallel and Distributed Systems
Efficient Run-Time Support for Irregular Task Computations with Mixed Granularities

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian

An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian
A Supernodal Approach to Sparse Partial Pivoting

A Supernodal Approach to Sparse Partial Pivoting

Elimination forest guided 2D sparse LU factorization

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Analysis and comparison of two general sparse solvers for distributed memory computers

ACM Transactions on Mathematical Software (TOMS)
Making sparse Gaussian elimination scalable by static pivoting

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Pivots LU Algorithm on the Cray T3E

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fill-in structures during the elimination process. This paper presents an approach called S* for parallelizing this problem on distributed memory machines. The S* approach adopts static symbolic factorization to avoid run-time control overhead, incorporates 2D L/U supernode partitioning and amalgamation strategies to improve caching performance, and exploits irregular task parallelism embedded in sparse LU using asynchronous computation scheduling. The paper discusses and compares the algorithms using 1D and 2D data mapping schemes, and presents experimental studies on Cray-T3D and T3E. The performance results for a set of nonsymmetric benchmark matrices are very encouraging, and S* has achieved up to 6.878 GFLOPS on 128 T3E nodes. To the best of our knowledge, this is the highest performance ever achieved for this challenging problem and the previous record was 2.583 GFLOPS on shared memory machines [8].