SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

Authors:
Xiaoye S. Li;James W. Demmel
Affiliations:
Lawrence Berkeley National Laboratory, Berkeley, CA;University of California at Berkeley, Berkeley, CA
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2003

Citing 32
Cited 50

GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
Direct methods for sparse matrices

Direct methods for sparse matrices
Symbolic factorization for sparse Gaussian elimination with partial pivoting

SIAM Journal on Scientific and Statistical Computing
A data structure for sparse QR and LU factorizations

SIAM Journal on Scientific and Statistical Computing - Telecommunication Programs at U.S. Universities
The role of elimination trees in sparse factorization

SIAM Journal on Matrix Analysis and Applications
Elimination structures for unsymmetric sparse LU factors

SIAM Journal on Matrix Analysis and Applications
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Scalable iterative solution of sparse linear systems

Parallel Computing
Predicting Structure in Sparse Matrix Computations

SIAM Journal on Matrix Analysis and Applications
Modification of the minimum-degree algorithm by multiple elimination

ACM Transactions on Mathematical Software (TOMS)
Performance of Panel and Block Approaches to Sparse Cholesky Factorization on the iPSC/860 and Paragon Multicomputers

SIAM Journal on Scientific Computing
An Approximate Minimum Degree Ordering Algorithm

SIAM Journal on Matrix Analysis and Applications
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

IEEE Transactions on Parallel and Distributed Systems
Applied numerical linear algebra

Applied numerical linear algebra
ScaLAPACK user's guide

ScaLAPACK user's guide
Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures

IEEE Transactions on Parallel and Distributed Systems
A combined unifrontal/multifrontal method for unsymmetric sparse matrices

ACM Transactions on Mathematical Software (TOMS)
Robust Ordering of Sparse Matrices using Multisection

SIAM Journal on Matrix Analysis and Applications
A Supernodal Approach to Sparse Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
The Design and Use of Algorithms for Permuting Large Entries to the Diagonal of Sparse Matrices

SIAM Journal on Matrix Analysis and Applications
An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination

SIAM Journal on Matrix Analysis and Applications
Analysis and comparison of two general sparse solvers for distributed memory computers

ACM Transactions on Mathematical Software (TOMS)
Making sparse Gaussian elimination scalable by static pivoting

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Design, implementation and testing of extended and mixed precision BLAS

ACM Transactions on Mathematical Software (TOMS)
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite
Solution of a three-body problem in quantum mechanics using sparse linear algebra on parallel computers

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Preconditioning Highly Indefinite and Nonsymmetric Matrices

SIAM Journal on Scientific Computing
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

SIAM Journal on Matrix Analysis and Applications
An Unsymmetrized Multifrontal LU Factorization

SIAM Journal on Matrix Analysis and Applications
A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Preconditioning sparse matrices for computing eigenvalues and solving linear systems of equations

Preconditioning sparse matrices for computing eigenvalues and solving linear systems of equations

Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers

Parallel Computing
Parallelization of Direct Algorithms using Multisplitting Methods in Grid Environments

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Parallel sparse LU factorization on second-class message passing platforms

Proceedings of the 19th annual international conference on Supercomputing
An overview of SuperLU: Algorithms, implementation, and user interface

ACM Transactions on Mathematical Software (TOMS) - Special issue on the Advanced CompuTational Software (ACTS) Collection
Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Error bounds from extra-precise iterative refinement

ACM Transactions on Mathematical Software (TOMS)
Making a Supercomputer Do What You Want: High-Level Tools for Parallel Programming

Computing in Science and Engineering
High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel sparse LU factorization on different message passing platforms

Journal of Parallel and Distributed Computing
Parallel unsymmetric-pattern multifrontal sparse LU with column preordering

ACM Transactions on Mathematical Software (TOMS)
Scaling performance of interior-point method on large-scale chip multiprocessor system

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Multi-threading and one-sided communication in parallel LU factorization

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
GREMLINS: a large sparse linear solver for grid environment

Parallel Computing
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

ACM Transactions on Mathematical Software (TOMS)
A multi-level parallel simulation approach to electron transport in nano-scale transistors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Dendro: parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Parallel Sparse Linear Solver for Nearest-Neighbor Tight-Binding Problems

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A parallel Schur complement solver for the solution of the adjoint steady-state lattice Boltzmann equations: application to design optimisation

International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
A sparse nonsymmetric eigensolver for distributed memory architectures

International Journal of Parallel, Emergent and Distributed Systems
Computational complexity and parallelization of the meshless local Petrov-Galerkin method

Computers and Structures
Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms

High Performance Computing for Computational Science - VECPAR 2008
The design and use of a sparse direct solver for skew symmetric matrices

Journal of Computational and Applied Mathematics
Parallel solution of the chemical master equation

SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
A parallel preconditioning strategy for efficient transistor-level circuit simulation

Proceedings of the 2009 International Conference on Computer-Aided Design
Numerical strategies towards peta-scale simulations of nanoelectronics devices

Parallel Computing
A Component-Based Framework for Smoothed Particle Hydrodynamics Simulations of Reactive Fluid Flow in Porous Media

International Journal of High Performance Computing Applications
Parallel finite element simulations of incompressible viscous fluid flow by domain decomposition with Lagrange multipliers

Journal of Computational Physics
Analysis of a mixed semi-implicit/implicit algorithm for low-frequency two-fluid plasma modeling

Journal of Computational Physics
BCYCLIC: A parallel block tridiagonal matrix cyclic solver

Journal of Computational Physics
Fast algorithms for placing large entries along the diagonal of a sparse matrix

Journal of Computational and Applied Mathematics
A Parallel Implementation of Electron-Phonon Scattering in Nanoelectronic Devices up to 95k Cores

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A Parallel Geometric Multigrid Method for Finite Elements on Octree Meshes

SIAM Journal on Scientific Computing
On techniques to improve robustness and scalability of a parallel hybrid linear solver

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
A domain-decomposing parallel sparse linear system solver

Journal of Computational and Applied Mathematics
Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Sparse triangular solves for ILU revisited: data layout crucial to better performance

International Journal of High Performance Computing Applications
Hypergraph-Based Unsymmetric Nested Dissection Ordering for Sparse LU Factorization

SIAM Journal on Scientific Computing
A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations

SIAM Journal on Scientific Computing
3POr: parallel projection based parameterized order reduction for multi-dimensional linear models

Proceedings of the International Conference on Computer-Aided Design
Oblio: design and performance

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Performance analysis of parallel right-looking sparse LU factorization on two dimensional grids of processors

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A shared- and distributed-memory parallel sparse direct solver

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Parallel treatment of general sparse matrices

LSSC'05 Proceedings of the 5th international conference on Large-Scale Scientific Computing
Sparse matrices in Matlab*P: design and implementation

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A static parallel multifrontal solver for finite element meshes

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Sparse LU factorization for parallel circuit simulation on GPU

Proceedings of the 49th Annual Design Automation Conference
Research note: Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations

Journal of Parallel and Distributed Computing
A look inside the earth: geophysical imaging of the subsurface

XRDS: Crossroads, The ACM Magazine for Students - Scientific Computing
A Galerkin least squares method for time harmonic Maxwell equations using Nédélec elements

Journal of Computational Physics
Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems

Scientific Programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

We present the main algorithmic features in the software package SuperLU_DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software's parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication patterns, which lets us exploit techniques used in parallel sparse Cholesky algorithms to better parallelize both LU decomposition and triangular solution on large-scale distributed machines.