Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

Authors:
Anshul Gupta;George Karypis;Vipin Kumar
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;Univ. of Minnesota, Minneapolis;Univ. of Minnesota, Minneapolis
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1997

Citing 33
Cited 33

Efficient parallel solution of linear systems

STOC '85 Proceedings of the seventeenth annual ACM symposium on Theory of computing
Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Sparse Cholesky factorization on a local-memory multiprocessor

SIAM Journal on Scientific and Statistical Computing
Solving planar systems of equations on distributed-memory multiprocessors

Solving planar systems of equations on distributed-memory multiprocessors
The iPSC/2 direct-connect communications technology

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Parallel algorithms for dense linear algebra computations

SIAM Review
Partitioning sparse matrices with eigenvectors of graphs

SIAM Journal on Matrix Analysis and Applications
Task scheduling for parallel sparse Cholesky factorization

International Journal of Parallel Programming
Limiting communication in parallel sparse Cholesky factorization

SIAM Journal on Scientific and Statistical Computing
Parallel algorithms for sparse linear systems

SIAM Review
Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The multifrontal method for sparse matrix solution: theory and practice

SIAM Review
A grid-based subtree-subcube assignment strategy for solving partial differential equations on hypercubes

SIAM Journal on Scientific and Statistical Computing
Highly parallel sparse Cholesky factorization

SIAM Journal on Scientific and Statistical Computing
Distributed sparse matrix factorization: QR and Cholesky decompositions

Distributed sparse matrix factorization: QR and Cholesky decompositions
Towards a fast implementation of spectral nested dissection

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Performance properties of large scale parallel systems

Journal of Parallel and Distributed Computing - Special issue on performance of supercomputers
An efficient block-oriented approach to parallel sparse Cholesky factorization

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Analyzing scalability of parallel algorithms and architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Parallel algorithms for forward and back substitution in direct solution of sparse linear systems

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

ICS '89 Proceedings of the 3rd international conference on Supercomputing
The Multifrontal Solution of Indefinite Sparse Symmetric Linear

ACM Transactions on Mathematical Software (TOMS)
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite
A parallel formulation of interior point algorithms

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Improved load distribution in parallel sparse cholesky factorization

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A scalable parallel algorithm for sparse Cholesky factorization

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures

IEEE Parallel & Distributed Technology: Systems & Technology
Distributed Multifrontal Factorization Using Clique Trees

Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
A high performance sparse Cholesky factorization algorithm for scalable parallel computers

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Efficient Parallel Solutions of Large Sparse SPD Systems on Distributed-memory Multiprocessors

Efficient Parallel Solutions of Large Sparse SPD Systems on Distributed-memory Multiprocessors
A distributed solution of sparse linear systems

A distributed solution of sparse linear systems
Analysis and design of scalable parallel algorithms for scientific computing

Analysis and design of scalable parallel algorithms for scientific computing

Parallel algorithms for forward and back substitution in direct solution of sparse linear systems

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Fast and effective algorithms for graph partitioning and sparse-matrix ordering

IBM Journal of Research and Development - Special issue: optical lithography I
Elimination forest guided 2D sparse LU factorization

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Parallel multilevel k-way partitioning scheme for irregular graphs

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Analysis and comparison of two general sparse solvers for distributed memory computers

ACM Transactions on Mathematical Software (TOMS)
Parallel threshold-based ILU factorization

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems

Parallel Computing - Parallel matrix algorithms and applications
MUMPS: A General Purpose Distributed Memory Sparse Solver

PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
Scalable Sparse Matrix Techniques for Modeling Crack Growth

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Ordering Unsymmetric Matrices into Bordered Block Diagonal Form for Parallel Processing

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Parallel Pivots LU Algorithm on the Cray T3E

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

Parallel Computing
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

ACM Transactions on Mathematical Software (TOMS)
Nonlinear optimization and parallel computing

Parallel Computing - Special issue: Parallel computing in numerical optimization
Architecture, algorithms and applications for future generation supercomputers

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
References

Sourcebook of parallel computing
Parallel and fully recursive multifrontal sparse Cholesky

Future Generation Computer Systems - Special issue: Selected numerical algorithms
Approaches Based on Permutations for Partitioning Sparse Matrices on Multiprocessors

The Journal of Supercomputing
3-D Seismic Imaging using High-performance Parallel Direct Solver for Large-scale Finite Element Analysis

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations

ACM Transactions on Mathematical Software (TOMS)
Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate

ACM Transactions on Mathematical Software (TOMS)
Sparse matrix factorization on massively parallel computers
Comparing measures of sparsity

IEEE Transactions on Information Theory
A wavelet-based multiresolution reconstruction method for fluorescent molecular tomography

Journal of Biomedical Imaging
Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
SelInv---An Algorithm for Selected Inversion of a Sparse Symmetric Matrix

ACM Transactions on Mathematical Software (TOMS)
A CPU-GPU hybrid approach for the unsymmetric multifrontal method

Parallel Computing
Performance analysis of parallel right-looking sparse LU factorization on two dimensional grids of processors

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Direct sparse factorization of blocked saddle point matrices

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A direct solver with reutilization of LU factorizations for h-adaptive finite element grids with point singularities

Computers & Mathematics with Applications
Multiphysics simulations: Challenges and opportunities

International Journal of High Performance Computing Applications
Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.06

Visualization

Abstract

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems驴both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Cray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer.