LAPACK's user's guide
ScaLAPACK user's guide
On the Automatic Parallelization of the Perfect Benchmarks®
IEEE Transactions on Parallel and Distributed Systems
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Doany: Not Just Another Parallel Loop
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Graph theory: An algorithmic approach (Computer science and applied mathematics)
Graph theory: An algorithmic approach (Computer science and applied mathematics)
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
ACM Transactions on Mathematical Software (TOMS)
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice & Experience
QR factorization for the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
High performance matrix inversion based on LU factorization for multicore architectures
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.00 |
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, a QR factorization, and computing the inverse of a symmetric positive definite matrix. In this extended abstract, we revisit the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that, for some variants, non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of the application. We present preliminary experimental results.