Performance of Panel and Block Approaches to Sparse Cholesky Factorization on the iPSC/860 and Paragon Multicomputers

Authors:
Edward Rothberg
Affiliations:
-
Venue:
SIAM Journal on Scientific Computing
Year:
1996

Citing 0
Cited 10

Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Finding Optimal Ordering of Sparse Matrices for Column-Oriented Parallel Cholesky Factorization

The Journal of Supercomputing
PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems

Parallel Computing - Parallel matrix algorithms and applications
PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization

Parallel Computing
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

ACM Transactions on Mathematical Software (TOMS)
Hypermatrix oriented supernode amalgamation

The Journal of Supercomputing
Algorithmic performance studies on graphics processing units

Journal of Parallel and Distributed Computing
Optimization of a statically partitioned hypermatrix sparse cholesky factorization

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Sparse Cholesky factorization has historically achieved extremely low performance on distributed-memory multiprocessors. We believe that three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate parallel sparse factorization performance should be more representative of the sizes of matrices people would factor on large parallel machines. This paper demonstrates that all three of these issues have in fact already been addressed. Specifically, (1) single node performance can be improved by moving from a column-oriented approach, where the computational kernel is level 1 BLAS, to either a panel- or block-oriented approach, where the computational kernel is level 3 BLAS; (2) communication hardware has improved dramatically, with new parallel computers (the Intel Paragon system) providing one to two orders of magnitude higher communication bandwidth than previous parallel computers (the Intel iPSC/860 system); and (3) several larger benchmark matrices are now available, and newer parallel machines offer sufficient memory per node to factor these larger matrices. The result of addressing these three issues is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision Mflops on 32 nodes of the Intel Paragon system, 1 Gflop on 64 nodes, and 1.7 Gflops on 128 nodes. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison between panel- and block-oriented approaches to parallel factorization.