Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion

Authors:
Fred G. Gustavson;Jerzy Waśniewski;Jack J. Dongarra;Julien Langou
Affiliations:
Umeå University and IBM T.J. Watson Research Center (Emeritus), Yorktown Heights, NY;Technical University of Denmark, Kongens Lyngby, Denmark;University of Tennessee, Knoxville, TN Oak Ridge National Laboratory, and University of Manchester;University of Colorado Denver, Denver, CO
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2010

Citing 22
Cited 1

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Applied numerical linear algebra

Applied numerical linear algebra
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
A recursive formulation of Cholesky factorization of a matrix in packed storage

ACM Transactions on Mathematical Software (TOMS)
LAPACK95 users' guide

LAPACK95 users' guide
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Numerical Linear Algebra for High Performance Computers

Numerical Linear Algebra for High Performance Computers
A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
High-performance linear algebra algorithms using new generalized data structures for matrices

IBM Journal of Research and Development
A fully portable high performance minimal storage hybrid format Cholesky algorithm

ACM Transactions on Mathematical Software (TOMS)
Algorithm 865: Fortran 95 subroutines for Cholesky factorization in block hybrid format

ACM Transactions on Mathematical Software (TOMS)
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
Minimal-storage high-performance Cholesky factorization via blocking and recursion

IBM Journal of Research and Development
Minimal data copy for dense linear algebra factorization

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Rectangular full packed format for LAPACK algorithms timings on several computers

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
A new array format for symmetric and triangular matrices

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Cache-Oblivious algorithms and matrix formats for computations on interval matrices

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a new data format for storing triangular, symmetric, and Hermitian matrices called Rectangular Full Packed Format (RFPF). The standard two-dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular and symmetric matrices waste nearly half of the storage space but provide high performance via the use of Level 3 BLAS. Standard packed format arrays fully utilize storage (array space) but provide low performance as there is no Level 3 packed BLAS. We combine the good features of packed and full storage using RFPF to obtain high performance via using Level 3 BLAS as RFPF is a standard full-format representation. Also, RFPF requires exactly the same minimal storage as packed the format. Each LAPACK full and/or packed triangular, symmetric, and Hermitian routine becomes a single new RFPF routine based on eight possible data layouts of RFPF. This new RFPF routine usually consists of two calls to the corresponding LAPACK full-format routine and two calls to Level 3 BLAS routines. This means no new software is required. As examples, we present LAPACK routines for Cholesky factorization, Cholesky solution, and Cholesky inverse computation in RFPF to illustrate this new work and to describe its performance on several commonly used computer platforms. Performance of LAPACK full routines using RFPF versus LAPACK full routines using the standard format for both serial and SMP parallel processing is about the same while using half the storage. Performance gains are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP parallel times faster using vendor LAPACK full routines with RFPF than with using vendor and/or reference packed routines.