Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Recursive Blocked Data Formats and BLAS's for Dense Linear Algebra Algorithms
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Minimal-storage high-performance Cholesky factorization via blocking and recursion
IBM Journal of Research and Development
Communication-optimal Parallel and Sequential Cholesky Decomposition
SIAM Journal on Scientific Computing
Hi-index | 0.00 |
We present a high performance Cholesky factorization algorithm, called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and a recursive packed data format. A full analysis of overcoming the non-linear addressing overhead imposed by recursion is given and discussed. Finally, since BPC uses GEMM to a great extent, we easily get a considerable amount of SMP parallelism from an SMP GEMM.