A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination
SIAM Journal on Matrix Analysis and Applications
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Scheduling Linear Algebra Parallel Algorithms on MIMD Architectures
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing
New Generalized Matrix Data Structures Lead to a Variety of High-Performance Algorithms
Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scientific Software
Representing linear algebra algorithms in code: the FLAME application program interfaces
ACM Transactions on Mathematical Software (TOMS)
Parallel out-of-core computation and updating of the QR factorization
ACM Transactions on Mathematical Software (TOMS)
OpenMP issues arising in the development of parallel BLAS and LAPACK libraries
Scientific Programming - OpenMP
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures
PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Multi-threading and one-sided communication in parallel LU factorization
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Updating an LU Factorization with Pivoting
ACM Transactions on Mathematical Software (TOMS)
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice & Experience
Communication avoiding Gaussian elimination
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.00 |
We describe parallel implementations of LU factorization with pivoting for multicore architectures. Implementations that differ in two different dimensions are discussed: (1) using classical partial pivoting versus recently proposed incremental pivoting and (2) extracting parallelism only within the Basic Linear Algebra Subprograms versus building and scheduling a directed acyclic graph of tasks. Performance comparisons are given on two different systems.