Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

  • Authors:
  • Jaeyoung Choi;Jack J. Dongarra;L. Susan Ostrouchov;Antoine P. Petitet;David W. Walker;R. Clint Whaley

  • Affiliations:
  • Department of Computer Science, University of Tennessee at Knoxville, 107 Ayres Hall, Knoxville, TN 37996-1301/ e-mail: {choi,dongarra,sost,petitet,rwhaley}@cs.utk.edu;Dept. of Comp. Sci., Univ. of Tennessee at Knoxville, 107 Ayres Hall, Knoxville, TN 37996-1301 and Math. Sci. Section, Oak Ridge Natnl. Lab., P.O. Box 2008, Bldg. 6012, Oak Ridge, TN 37831-6367/ e ...;Department of Computer Science, University of Tennessee at Knoxville, 107 Ayres Hall, Knoxville, TN 37996-1301/ e-mail: {choi,dongarra,sost,petitet,rwhaley}@cs.utk.edu;Department of Computer Science, University of Tennessee at Knoxville, 107 Ayres Hall, Knoxville, TN 37996-1301/ e-mail: {choi,dongarra,sost,petitet,rwhaley}@cs.utk.edu;Mathematical Sciences Section, Oak Ridge National Laboratory, P.O. Box 2008, Bldg. 6012, Oak Ridge, TN 37831-6367/ e-mail: walker@rios2.epm.oral.gov;Department of Computer Science, University of Tennessee at Knoxville, 107 Ayres Hall, Knoxville, TN 37996-1301/ e-mail: {choi,dongarra,sost,petitet,rwhaley}@cs.utk.edu

  • Venue:
  • Scientific Programming
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article discusses the core factorization routines included in the ScaLAPACK library. These routines allow the factorization and solution of a dense system of linear equations via LU, QR, and Cholesky. They are implemented using a block cyclic data distribution, and are built using de facto standard kernels for matrix and vector operations (BLAS and its parallel counterpart PBLAS) and message passing communication (BLACS). In implementing the ScaLAPACK routines, a major objective was to parallelize the corresponding sequential LAPACK using the BLAS, BLACS, and PBLAS as building blocks, leading to straightforward parallel implementations without a significant loss in performance. We present the details of the implementation of the ScaLAPACK factorization routines, as well as performance and scalability results on the Intel iPSC/860, Intel Touchstone Delta, and Intel Paragon System.