Solving projective complete intersection faster
ISSAC '00 Proceedings of the 2000 international symposium on Symbolic and algebraic computation
Analysis and comparison of two general sparse solvers for distributed memory computers
ACM Transactions on Mathematical Software (TOMS)
Recent advances in direct methods for solving unsymmetric sparse systems of linear equations
ACM Transactions on Mathematical Software (TOMS)
Implementing Hager's exchange methods for matrix profile reduction
ACM Transactions on Mathematical Software (TOMS)
Parallel Computing - Parallel matrix algorithms and applications
Solving Unsymmetric Sparse Systems of Linear Equations with PARDISO
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
An Experimental Comparison of some Direct Sparse Solver Packages
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
ACM Transactions on Mathematical Software (TOMS)
Solving unsymmetric sparse systems of linear equations with PARDISO
Future Generation Computer Systems - Special issue: Selected numerical algorithms
An overview of SuperLU: Algorithms, implementation, and user interface
ACM Transactions on Mathematical Software (TOMS) - Special issue on the Advanced CompuTational Software (ACTS) Collection
Parallel unsymmetric-pattern multifrontal sparse LU with column preordering
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the 45th annual Design Automation Conference
Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms
High Performance Computing for Computational Science - VECPAR 2008
Design, Tuning and Evaluation of Parallel Multilevel ILU Preconditioners
High Performance Computing for Computational Science - VECPAR 2008
Parallelization of Advection-Diffusion-Chemistry Modules
Large-Scale Scientific Computing
Age based scheduling for asymmetric multiprocessors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Managing the complexity of lookahead for LU factorization with pivoting
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
ACM Transactions on Mathematical Software (TOMS)
Parallel program performance modeling for runtime optimization of multi-algorithm circuit simulation
Proceedings of the 47th Design Automation Conference
Efficient implementation of stable Richardson Extrapolation algorithms
Computers & Mathematics with Applications
Implementation of sparse matrix algorithms in an advection-diffusion-chemistry module
Journal of Computational and Applied Mathematics
The university of Florida sparse matrix collection
ACM Transactions on Mathematical Software (TOMS)
Journal of Computational Physics
Design of a Multicore Sparse Cholesky Factorization Using DAGs
SIAM Journal on Scientific Computing
On-the-fly runtime adaptation for efficient execution of parallel multi-algorithm circuit simulation
Proceedings of the International Conference on Computer-Aided Design
3D-ICE: fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling
Proceedings of the International Conference on Computer-Aided Design
3POr: parallel projection based parameterized order reduction for multi-dimensional linear models
Proceedings of the International Conference on Computer-Aided Design
An efficient multi-level trace toolkit for multi-threaded applications
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Parallel treatment of general sparse matrices
LSSC'05 Proceedings of the 5th international conference on Large-Scale Scientific Computing
Sparse LU factorization for parallel circuit simulation on GPU
Proceedings of the 49th Annual Design Automation Conference
Efficient parallel power grid analysis via additive Schwarz method
Proceedings of the International Conference on Computer-Aided Design
Time-domain segmentation based massively parallel simulation for ADCs
Proceedings of the 50th Annual Design Automation Conference
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems
Scientific Programming
Hi-index | 0.01 |
Although Gaussian elimination with partial pivoting is a robust algorithm to solve unsymmetric sparse linear systems of equations, it is difficult to implement efficiently on parallel machines because of its dynamic and somewhat unpredictable way of generating work and intermediate results at run time. In this paper, we present an efficient parallel algorithm that overcomes this difficulty. The high performance of our algorithm is achieved through (1) using a graph reduction technique and a supernode-panel computational kernel for high single processor utilization, and (2) scheduling two types of parallel tasks for a high level of concurrency. One such task is factoring the independent panels in the disjoint subtrees of the column elimination tree of $A$. Another task is updating a panel by previously computed supernodes. A scheduler assigns tasks to free processors dynamically and facilitates the smooth transition between the two types of parallel tasks. No global synchronization is used in the algorithm. The algorithm is well suited for shared memory machines (SMP) with a modest number of processors. We demonstrate 4- to 7-fold speedups on a range of 8 processor SMPs, and more on larger SMPs. One realistic problem arising from a 3-D flow calculation achieves factorization rates of 1.0, 2.5, 0.8, and 0.8 gigaflops on the 12 processor Power Challenge, 8 processor Cray C90, 16 processor Cray J90, and 8 processor AlphaServer 8400.