Multifrontal QR factorization for multicore architectures over runtime systems

Authors:
Emmanuel Agullo;Alfredo Buttari;Abdou Guermouche;Florent Lopez
Affiliations:
LaBRI, INRIA, Bordeaux, France;CNRS, IRIT, Toulouse, France;LaBRI, Université de Bordeaux, Bordeaux, France;IRIT, Université Paul Sabatier, Toulouse, France
Venue:
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Year:
2013

Citing 12
Cited 0

A New Implementation of Sparse Gaussian Elimination

ACM Transactions on Mathematical Software (TOMS)
The Multifrontal Solution of Indefinite Sparse Symmetric Linear

ACM Transactions on Mathematical Software (TOMS)
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Impact of reordering on the memory of a multifrontal solver

Parallel Computing - Parallel matrix algorithms and applications (PMAA '02)
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Parallelizing dense and banded linear algebra libraries using SMPSs

Concurrency and Computation: Practice & Experience
Multi-GPU and multi-CPU parallelization for interactive physics simulations

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization

ACM Transactions on Mathematical Software (TOMS)
DAGuE: A generic distributed DAG engine for High Performance Computing

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability of runtime systems for complex applications, namely, sparse matrix multifrontal factorizations which constitute extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Experimental results on real-life matrices show that it is possible to achieve the same efficiency as with an ad hoc scheduler which relies on the knowledge of the algorithm. A detailed analysis shows the performance behavior of the resulting code and possible ways of improving the effectiveness of runtime systems.