A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
LAPACK's user's guide
A survey of out-of-core algorithms in numerical linear algebra
External memory algorithms
The Design and Implementation of the Parallel Out-of-coreScaLAPACK LU, QR, and Cholesky Factorization Routines
POOCLAPACK: Parallel Out-of-Core Linear Algebra Package
POOCLAPACK: Parallel Out-of-Core Linear Algebra Package
Parallel out-of-core computation and updating of the QR factorization
ACM Transactions on Mathematical Software (TOMS)
Computational methods and processing strategies for estimating earth's gravity field
Computational methods and processing strategies for estimating earth's gravity field
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Solving "large dense matrix problems on multi-core processors
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Journal of Computational and Applied Mathematics
Using desktop computers to solve large-scale dense linear algebra problems
The Journal of Supercomputing
Performance study of matrix computations using multi-core programming tools
Proceedings of the Fifth Balkan Conference in Informatics
Scaling LAPACK panel operations using parallel cache assignment
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.01 |
We target the development of high-performance algorithms for dense matrix operations where data resides on disk and has to be explicitly moved in and out of the main memory. We provide strong evidence that, even for a complex operation like the QR factorization, the use of a run-time system creates a separation of concerns between the matrix computations and I/O operations with the result that no significant changes need to be introduced to existing in-core algorithms. The library developer can thus focus on the design of algorithms-by-blocks, addressing disk memory as just another level of the memory hierarchy. Experimental results for the out-of-core computation of the QR factorization on a multi-core processor reveal the potential of this approach.