Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures

Authors:
Emmanuel Agullo;Henricus Bouwmeester;Jack Dongarra;Jakub Kurzak;Julien Langou;Lee Rosenberg
Affiliations:
Dpt. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN;Dpt. of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado;Dpt. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN;Dpt. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN;Dpt. of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado;Dpt. of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado
Venue:
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Year:
2010

Citing 15
Cited 2

LAPACK's user's guide

LAPACK's user's guide
ScaLAPACK user's guide

ScaLAPACK user's guide
On the Automatic Parallelization of the Perfect Benchmarks®

IEEE Transactions on Parallel and Distributed Systems
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Jade: A High-Level, Machine-Independent Language for Parallel Programming

Computer
Doany: Not Just Another Parallel Loop

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Graph theory: An algorithmic approach (Computer science and applied mathematics)

Graph theory: An algorithmic approach (Computer science and applied mathematics)
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix

ACM Transactions on Mathematical Software (TOMS)
Parallel tiled QR factorization for multicore architectures

Concurrency and Computation: Practice & Experience
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
QR factorization for the Cell Broadband Engine

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

High performance matrix inversion based on LU factorization for multicore architectures

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, a QR factorization, and computing the inverse of a symmetric positive definite matrix. In this extended abstract, we revisit the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that, for some variants, non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of the application. We present preliminary experimental results.