A New Implementation of Sparse Gaussian Elimination
ACM Transactions on Mathematical Software (TOMS)
The Multifrontal Solution of Indefinite Sparse Symmetric Linear
ACM Transactions on Mathematical Software (TOMS)
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
Impact of reordering on the memory of a multifrontal solver
Parallel Computing - Parallel matrix algorithms and applications (PMAA '02)
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Parallelizing dense and banded linear algebra libraries using SMPSs
Concurrency and Computation: Practice & Experience
Multi-GPU and multi-CPU parallelization for interactive physics simulations
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization
ACM Transactions on Mathematical Software (TOMS)
DAGuE: A generic distributed DAG engine for High Performance Computing
Parallel Computing
Hi-index | 0.00 |
To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability of runtime systems for complex applications, namely, sparse matrix multifrontal factorizations which constitute extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Experimental results on real-life matrices show that it is possible to achieve the same efficiency as with an ad hoc scheduler which relies on the knowledge of the algorithm. A detailed analysis shows the performance behavior of the resulting code and possible ways of improving the effectiveness of runtime systems.