Improved load distribution in parallel sparse cholesky factorization

Authors:
Edward Rothberg;Robert Schreiber
Affiliations:
Intel Supercomputer Systems Division, Beaverton, OR;Research Institute for Advanced Computer Science, Moffett Field, CA
Venue:
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Year:
1994

Citing 10
Cited 14

Sparse matrix test problems

ACM Transactions on Mathematical Software (TOMS)
The influence of relaxed supernode partitions on the multifrontal method

ACM Transactions on Mathematical Software (TOMS)
The role of elimination trees in sparse factorization

SIAM Journal on Matrix Analysis and Applications
Highly parallel sparse Cholesky factorization

SIAM Journal on Scientific and Statistical Computing
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization

Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
An efficient block-oriented approach to parallel sparse Cholesky factorization

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Modification of the minimum-degree algorithm by multiple elimination

ACM Transactions on Mathematical Software (TOMS)
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
An evaluation of left-lookikng, right-looking and multifrontal approaches to sparse Cholesky factorization on hierarchical memory machines

An evaluation of left-lookikng, right-looking and multifrontal approaches to sparse Cholesky factorization on hierarchical memory machines
Massively Parallel Linpack Benchmark on the Intel Touchstone Delta andIPSC/860 Systems (Progress Report)

Massively Parallel Linpack Benchmark on the Intel Touchstone Delta andIPSC/860 Systems (Progress Report)

Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

IEEE Transactions on Parallel and Distributed Systems
Space and time efficient execution of parallel irregular computations

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures

IEEE Transactions on Parallel and Distributed Systems
Elimination forest guided 2D sparse LU factorization

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Space/time-efficient scheduling and execution of parallel irregular computations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Sparse LU factorization with partial pivoting on distributed memory machines

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems

Parallel Computing - Parallel matrix algorithms and applications
Efficient Run-Time Support for Irregular Task Computations with Mixed Granularities

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

Parallel Computing
A statistically-based multi-algorithmic approach for load-balancing sparse matrix computations

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems

Proceedings of the 22nd annual international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compared to the customary column-oriented approaches, block-oriented, distributed-memory sparse Cholesky factorization benefits from an asymptotic reduction in interprocessor communication volume and an asymptotic increase in the amount of concurrency that is exposed in the problem. Unfortunately, block-oriented approaches (specifically, the block fan-out method) have suffered from poor balance of the computational load. As a result, achieved performance can be quite low. This paper investigates the reasons for this load imbalance and proposes simple block mapping heuristics that dramatically improve it. The result is a roughly 20% increase in realized parallel factorization performance, as demonstrated by performance results from an Intel Paragon™ system. We have achieved performance of nearly 3.2 billion floating point operations per second with this technique on a 196-node Paragon system.