A recursive formulation of Cholesky factorization of a matrix in packed storage

  • Authors:
  • Bjarne Stig Andersen;Jerzy Waśniewski;Fred G. Gustavson

  • Affiliations:
  • Danish Computing Center for Research and Education, Lyngby, Denmark;Danish Computing Center for Research and Education, Lyngby, Denmark;IBM T. J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new compact way to store a symmetric or triangular matrix called RPF for Recursive Packed Format is fully described. Novel ways to transform RPF to and from standard packed format are included. A new algorithm, called RPC for Recursive Packed Cholesky, that operates on the RPG format is presented. ALgorithm RPC is basd on level-3 BLAS and requires variants of algorithms TRSM and SYRK that work on RPF. We call these RP_TRSM and RP_SYRK and find that they do most of their work by calling GEMM. It follows that most of the execution time of RPC lies in GEMM. The advantage of this storage scheme compared to traditional packed and full storage is demonstrated. First, the RPC storage format uses the minimal amount of storage for the symmetric or triangular matrix. Second, RPC gives a level-3 implementation of Cholesky factorization whereas standard packed implementations are only level 2. Hence, the performance of our RPC implementation is decidedly superior. Third, unlike fixed block size algorithms, RPC, requires no block size tuning parameter. We present performance measurements on several current architectures that demonstrate improvements over the traditional packed routines. Also MSP parallel computations on the IBM SMP computer are made. The graphs that are attached in Section 7 show that the RPC algorithms are superior by a factor between 1.6 and 7.4 for order around 1000, and between 1.9 and 10.3 for order around 3000 over the traditional packed algorithms. For some architectures, the RPC performance results are almost the same or even better than the traditional full-storage algorithms results.