The analysis of a nested dissection algorithm
Numerische Mathematik
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
A Parallel Algorithm for Large Sparse Cholesky Factorization on a Multiprocessor
A Parallel Algorithm for Large Sparse Cholesky Factorization on a Multiprocessor
A Parallel Graph Partitioning Algorithm for a Message-Passing Multiprocessor
A Parallel Graph Partitioning Algorithm for a Message-Passing Multiprocessor
On the computation and communication tradeoffs and their impact on the performance of asynchronous multiprocessor systems
Effects of partitioning and scheduling sparse matrix factorization on communication and load balance
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization
IEEE Transactions on Parallel and Distributed Systems
A scalable parallel algorithm for sparse Cholesky factorization
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
Proceedings of the 2006 workshop on Memory system performance and correctness
Hi-index | 0.00 |
For multiprocessor systems with two level memory hierarchy; the communication requirements of parallel Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The data traffic associated with computing the Chloesky factor of an nn ×n dense matrix using n&agr; processors, &agr; ≤ 2, is shown to be &OHgr;(n2+&agr;/2), assuming that the computational load is uniformly distributed. For an nn ×n sparse matrix, representing a √n × √n regular grid graph, the corresponding data traffic is shown to be &OHgr;(n1+&agr;/2), &agr; ≤ 1.Partitioning schemes that are variations of block assignment scheme are described. The data traffic generated by these schemes are asymptotically optimal and these schemes allow efficient use of up to &Ogr;(n2) and &Ogr;(n) processors in the dense and the sparse case, respectively. The block based partitioning schemes are shown to provide a better utilization of the data accessed from the shared memory and reduce the total data traffic as compared to the schemes based on the column-wise wrap around assignment.