Parallel execution time analysis for least squares problems on distributed memory architectures

  • Authors:
  • Laurence Tianruo Yang;Richard P. Brent

  • Affiliations:
  • Department of Computer Science, St. Francis Xavier University, P.O. Box 5000, Antigonish, B2G 2W5, Nova Scotia, Canada and Computing Laboratory, Oxford University, Wolfson Building, Park Road, Oxf ...;Computing Laboratory, Oxford University, Wolfson Building, Park Road, Oxford, UK

  • Venue:
  • Practical parallel computing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we study the parallelization of PCGLS, a basic iterative method which main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations. Two important schemes are discussed. What is the best possible data distribution and which communication network topology is most suitable for solving least squares problems on massively parallel distributed memory computers. A theoretical model of data distribution and communication phases is presented which allows us to give a detail execution time complexity analysis and to investigate its usefulness. It is shown that the implementation of PCGLS, with a row-block decomposition of the coefficient matrix, on a ring of communication structure is the most efficient choice. Performance tests of the developed parallel PCGLS algorithm have been carried out on the massively distributed memory system Parsytec and experimental timing results are compared with the theoretical execution time complexity analysis.