ACM Transactions on Mathematical Software (TOMS)
The influence of relaxed supernode partitions on the multifrontal method
ACM Transactions on Mathematical Software (TOMS)
The role of elimination trees in sparse factorization
SIAM Journal on Matrix Analysis and Applications
Highly parallel sparse Cholesky factorization
SIAM Journal on Scientific and Statistical Computing
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
An efficient block-oriented approach to parallel sparse Cholesky factorization
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Modification of the minimum-degree algorithm by multiple elimination
ACM Transactions on Mathematical Software (TOMS)
Computers and Intractability; A Guide to the Theory of NP-Completeness
Computers and Intractability; A Guide to the Theory of NP-Completeness
An evaluation of left-lookikng, right-looking and multifrontal approaches to sparse Cholesky factorization on hierarchical memory machines
Massively Parallel Linpack Benchmark on the Intel Touchstone Delta andIPSC/860 Systems (Progress Report)
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization
IEEE Transactions on Parallel and Distributed Systems
Space and time efficient execution of parallel irregular computations
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
Elimination forest guided 2D sparse LU factorization
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Space/time-efficient scheduling and execution of parallel irregular computations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Sparse LU factorization with partial pivoting on distributed memory machines
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems
Parallel Computing - Parallel matrix algorithms and applications
Efficient Run-Time Support for Irregular Task Computations with Mixed Granularities
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A statistically-based multi-algorithmic approach for load-balancing sparse matrix computations
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Proceedings of the 22nd annual international conference on Supercomputing
Hi-index | 0.00 |
Compared to the customary column-oriented approaches, block-oriented, distributed-memory sparse Cholesky factorization benefits from an asymptotic reduction in interprocessor communication volume and an asymptotic increase in the amount of concurrency that is exposed in the problem. Unfortunately, block-oriented approaches (specifically, the block fan-out method) have suffered from poor balance of the computational load. As a result, achieved performance can be quite low. This paper investigates the reasons for this load imbalance and proposes simple block mapping heuristics that dramatically improve it. The result is a roughly 20% increase in realized parallel factorization performance, as demonstrated by performance results from an Intel Paragon™ system. We have achieved performance of nearly 3.2 billion floating point operations per second with this technique on a 196-node Paragon system.