Families of algorithms related to the inversion of a Symmetric Positive Definite matrix

Authors:
Paolo Bientinesi;Brian Gunter;Robert A. van de Geijn
Affiliations:
RWTH Aachen University, Aachen, Germany;Delft University of Technology, Delft, The Netherlands;The University of Texas at Austin, Austin, TX
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2008

Citing 28
Cited 10

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A logical approach to discrete math

A logical approach to discrete math
The torus-wrap mapping for dense matrix calculations on massively parallel computers

SIAM Journal on Scientific Computing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Using PLAPACK: parallel linear algebra package

Using PLAPACK: parallel linear algebra package
ScaLAPACK user's guide

ScaLAPACK user's guide
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark

ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference

MPI: The Complete Reference
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
A Note On Parallel Matrix Inversion

SIAM Journal on Scientific Computing
Recursive blocked algorithms for solving triangular systems—Part I: one-sided and coupled Sylvester-type matrix equations

ACM Transactions on Mathematical Software (TOMS)
Recursive blocked algorithms for solving triangular systems—Part II: two-sided and generalized Sylvester and Lyapunov matrix equations

ACM Transactions on Mathematical Software (TOMS)
A Family of High-Performance Matrix Multiplication Algorithms

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Hierarchical Approach for Performance Analysis of ScaLAPACK-Based Routines Using the Distributed Linear Algebra Machine

PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Inversion of Symmetric Matrices in a New Block Packes Storage

NAA '00 Revised Papers from the Second International Conference on Numerical Analysis and Its Applications
Scalable linear algebra software libraries for distributed memory concurrent computers

FTDCS '95 Proceedings of the 5th IEEE Workshop on Future Trends of Distributed Computing Systems
A systematic approach to the design and analysis of linear algebra algorithms

A systematic approach to the design and analysis of linear algebra algorithms
The science of deriving dense linear algebra algorithms

ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces

ACM Transactions on Mathematical Software (TOMS)
Computational methods and processing strategies for estimating earth's gravity field

Computational methods and processing strategies for estimating earth's gravity field
Mechanical derivation and systematic analysis of correct linear algebra algorithms

Mechanical derivation and systematic analysis of correct linear algebra algorithms
Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software (TOMS)

Representation-transparent matrix algorithms with scalable performance

Proceedings of the 21st annual international conference on Supercomputing
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software (TOMS)
Solving dense linear systems on platforms with multiple hardware accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Efficient model order reduction of large-scale systems on multi-core platforms

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Modeling performance through memory-stalls

ACM SIGMETRICS Performance Evaluation Review
Towards a functional run-time for dense NLA domain

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Confidence and prediction intervals for semiparametric mixed-effect least squares support vector machine

Pattern Recognition Letters

Quantified Score

Hi-index	0.02

Visualization

Abstract

We study the high-performance implementation of the inversion of a Symmetric Positive Definite (SPD) matrix on architectures ranging from sequential processors to Symmetric MultiProcessors to distributed memory parallel computers. This inversion is traditionally accomplished in three “sweeps”: a Cholesky factorization of the SPD matrix, the inversion of the resulting triangular matrix, and finally the multiplication of the inverted triangular matrix by its own transpose. We state different algorithms for each of these sweeps as well as algorithms that compute the result in a single sweep. One algorithm outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.