LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
A recursive formulation of Cholesky factorization of a matrix in packed storage
ACM Transactions on Mathematical Software (TOMS)
A fully portable high performance minimal storage hybrid format Cholesky algorithm
ACM Transactions on Mathematical Software (TOMS)
A new array format for symmetric and triangular matrices
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion
ACM Transactions on Mathematical Software (TOMS)
Minimal data copy for dense linear algebra factorization
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Three algorithms for Cholesky factorization on distributed memory using packed storage
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Cache-Oblivious algorithms and matrix formats for computations on interval matrices
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Hi-index | 0.00 |
We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste nearly half the storage space but provide high performance via the use of level 3 BLAS. Standard packed format arrays fully utilize storage (array space) but provide low performance as there are no level 3 packed BLAS. We combine the good features of packed and full storage using RFP format to obtain high performance using L3 (level 3) BLAS as RFP is full format. Also, RFP format requires exactly the same minimal storage as packed format. Each full and/or packed symmetric/triangular routine becomes a single new RFP routine. We present LAPACK routines for Cholesky factorization, inverse and solution computation in RFP format to illustrate this new work and to describe its performance on the IBM, Itanium, NEC, and SUN platforms. Performance of RFP versus LAPACK full routines for both serial and SMP parallel processing is about the same while using half the storage. Performance is roughly one to a factor of 33 for serial and one to a factor of 100 for SMP parallel times faster than LAPACK packed routines. Existing LAPACK routines and vendor LAPACK routines were used in the serial and the SMP parallel study, respectively. In both studies vendor L3 BLAS were used.