Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

Authors:
Vijay K. Naik;Merrell L. Patrick
Affiliations:
IBM Research, T.J. Watson Research Center, Yorktown Heights, NY;ICASE, NASA Langley Research Center, Hampton, VA and Computer Science Department, Duke University, Durham, NC
Venue:
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Year:
1989

Citing 6
Cited 4

Computational models and task scheduling for parallel sparse Cholesky factorization

Parallel Computing
The analysis of a nested dissection algorithm

Numerische Mathematik
Computer Solution of Large Sparse Positive Definite

Computer Solution of Large Sparse Positive Definite
A Parallel Algorithm for Large Sparse Cholesky Factorization on a Multiprocessor

A Parallel Algorithm for Large Sparse Cholesky Factorization on a Multiprocessor
A Parallel Graph Partitioning Algorithm for a Message-Passing Multiprocessor

A Parallel Graph Partitioning Algorithm for a Message-Passing Multiprocessor
On the computation and communication tradeoffs and their impact on the performance of asynchronous multiprocessor systems

On the computation and communication tradeoffs and their impact on the performance of asynchronous multiprocessor systems

Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

IEEE Transactions on Parallel and Distributed Systems
A scalable parallel algorithm for sparse Cholesky factorization

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

Proceedings of the 2006 workshop on Memory system performance and correctness

Quantified Score

Hi-index	0.00

Visualization

Abstract

For multiprocessor systems with two level memory hierarchy; the communication requirements of parallel Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The data traffic associated with computing the Chloesky factor of an nn ×n dense matrix using n&agr; processors, &agr; ≤ 2, is shown to be &OHgr;(n2+&agr;/2), assuming that the computational load is uniformly distributed. For an nn ×n sparse matrix, representing a √n × √n regular grid graph, the corresponding data traffic is shown to be &OHgr;(n1+&agr;/2), &agr; ≤ 1.Partitioning schemes that are variations of block assignment scheme are described. The data traffic generated by these schemes are asymptotically optimal and these schemes allow efficient use of up to &Ogr;(n2) and &Ogr;(n) processors in the dense and the sparse case, respectively. The block based partitioning schemes are shown to provide a better utilization of the data accessed from the shared memory and reduce the total data traffic as compared to the schemes based on the column-wise wrap around assignment.