Improved load distribution in parallel sparse cholesky factorization

  • Authors:
  • Edward Rothberg;Robert Schreiber

  • Affiliations:
  • Intel Supercomputer Systems Division, Beaverton, OR;Research Institute for Advanced Computer Science, Moffett Field, CA

  • Venue:
  • Proceedings of the 1994 ACM/IEEE conference on Supercomputing
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compared to the customary column-oriented approaches, block-oriented, distributed-memory sparse Cholesky factorization benefits from an asymptotic reduction in interprocessor communication volume and an asymptotic increase in the amount of concurrency that is exposed in the problem. Unfortunately, block-oriented approaches (specifically, the block fan-out method) have suffered from poor balance of the computational load. As a result, achieved performance can be quite low. This paper investigates the reasons for this load imbalance and proposes simple block mapping heuristics that dramatically improve it. The result is a roughly 20% increase in realized parallel factorization performance, as demonstrated by performance results from an Intel Paragon™ system. We have achieved performance of nearly 3.2 billion floating point operations per second with this technique on a 196-node Paragon system.