On the performance of parallel factorization of out-of-core matrices

  • Authors:
  • Eddy Caron;Gil Utard

  • Affiliations:
  • GRAAL Project, INRIA Rhône Alpes, LIP Laboratory (UMR CNRS, ENS Lyon, INRIA, Univ., Claude Bernard Lyon 1), 46 Allée d'Italie, 69364 Lyon Cedex 07, France;GRAAL Project, INRIA Rhône Alpes, LIP Laboratory (UMR CNRS, ENS Lyon, INRIA, Univ., Claude Bernard Lyon 1), 46 Allée d'Italie, 69364 Lyon Cedex 07, France

  • Venue:
  • Parallel Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an analytical performance model of the parallel left-right looking out-of-core LU factorization algorithm for cluster-like architectures. We show the accuracy of the performance prediction model for the ScaLAPACK library. We analyze the overhead introduced by the out-of-core part of the algorithm and we outline a limitation which was never seen before: for large problems the algorithm has a poor efficiency. This overhead is divided into an IO part and a communication part. We derive an overlapping scheme and minimum memory requirement to avoid the IO overhead. The new scheme is validated by a prototype implementation in ScaLAPACK. We show the impact of the communication overhead on two-dimensional distributions. Then we show that with similar memory requirements a second overlapping scheme may be implemented to avoid the communication overhead. If the size of the physical main memory is proportional to the matrix order (O(N) bytes), then performance of the out-of-core algorithm is similar to that of the in-core algorithm which requires O(N2) bytes. This paper demonstrates that there is no memory limitation for the factorization of huge matrices.