An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A logical approach to discrete math
A logical approach to discrete math
The torus-wrap mapping for dense matrix calculations on massively parallel computers
SIAM Journal on Scientific Computing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Using PLAPACK: parallel linear algebra package
Using PLAPACK: parallel linear algebra package
ScaLAPACK user's guide
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
A Note On Parallel Matrix Inversion
SIAM Journal on Scientific Computing
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
A Family of High-Performance Matrix Multiplication Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
PARA '96 Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Inversion of Symmetric Matrices in a New Block Packes Storage
NAA '00 Revised Papers from the Second International Conference on Numerical Analysis and Its Applications
Scalable linear algebra software libraries for distributed memory concurrent computers
FTDCS '95 Proceedings of the 5th IEEE Workshop on Future Trends of Distributed Computing Systems
A systematic approach to the design and analysis of linear algebra algorithms
A systematic approach to the design and analysis of linear algebra algorithms
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces
ACM Transactions on Mathematical Software (TOMS)
Computational methods and processing strategies for estimating earth's gravity field
Computational methods and processing strategies for estimating earth's gravity field
Mechanical derivation and systematic analysis of correct linear algebra algorithms
Mechanical derivation and systematic analysis of correct linear algebra algorithms
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Representation-transparent matrix algorithms with scalable performance
Proceedings of the 21st annual international conference on Supercomputing
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Solving dense linear systems on platforms with multiple hardware accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Efficient model order reduction of large-scale systems on multi-core platforms
ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Modeling performance through memory-stalls
ACM SIGMETRICS Performance Evaluation Review
Towards a functional run-time for dense NLA domain
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Pattern Recognition Letters
Hi-index | 0.02 |
We study the high-performance implementation of the inversion of a Symmetric Positive Definite (SPD) matrix on architectures ranging from sequential processors to Symmetric MultiProcessors to distributed memory parallel computers. This inversion is traditionally accomplished in three “sweeps”: a Cholesky factorization of the SPD matrix, the inversion of the resulting triangular matrix, and finally the multiplication of the inverted triangular matrix by its own transpose. We state different algorithms for each of these sweeps as well as algorithms that compute the result in a single sweep. One algorithm outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.