Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

  • Authors:
  • Vijay K. Naik;Merrell L. Patrick

  • Affiliations:
  • IBM Research, T.J. Watson Research Center, Yorktown Heights, NY;ICASE, NASA Langley Research Center, Hampton, VA and Computer Science Department, Duke University, Durham, NC

  • Venue:
  • ICS '89 Proceedings of the 3rd international conference on Supercomputing
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

For multiprocessor systems with two level memory hierarchy; the communication requirements of parallel Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The data traffic associated with computing the Chloesky factor of an nn ×n dense matrix using n&agr; processors, &agr; ≤ 2, is shown to be &OHgr;(n2+&agr;/2), assuming that the computational load is uniformly distributed. For an nn ×n sparse matrix, representing a √n × √n regular grid graph, the corresponding data traffic is shown to be &OHgr;(n1+&agr;/2), &agr; ≤ 1.Partitioning schemes that are variations of block assignment scheme are described. The data traffic generated by these schemes are asymptotically optimal and these schemes allow efficient use of up to &Ogr;(n2) and &Ogr;(n) processors in the dense and the sparse case, respectively. The block based partitioning schemes are shown to provide a better utilization of the data accessed from the shared memory and reduce the total data traffic as compared to the schemes based on the column-wise wrap around assignment.