Using PLAPACK: parallel linear algebra package
Using PLAPACK: parallel linear algebra package
A new O (N(2)) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem
A new O (N(2)) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem
Toward an Efficient Parallel Eigensolver for Dense Symmetric Matrices
SIAM Journal on Scientific Computing
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Application of a high performance parallel eigensolver to electronic structure calculations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
PLAPACK: parallel linear algebra package design overview
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel Matrix Distributions: Have we been doing it all wrong?
Parallel Matrix Distributions: Have we been doing it all wrong?
An Automated Multilevel Substructuring Method for Eigenspace Computation in Linear Elastodynamics
SIAM Journal on Scientific Computing
Representing linear algebra algorithms in code: the FLAME application program interfaces
ACM Transactions on Mathematical Software (TOMS)
SIAM Journal on Scientific Computing
Accumulating Householder transformations, revisited
ACM Transactions on Mathematical Software (TOMS)
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Collective communication: theory, practice, and experience: Research Articles
Concurrency and Computation: Practice & Experience
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Programming the Intel 80-core network-on-a-chip terascale processor
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
Concurrency and Computation: Practice & Experience
Using MPI derived datatypes in numerical libraries
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Mechanizing the expert dense linear algebra developer
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Concurrency and Computation: Practice & Experience
Communication avoiding and overlapping for numerical linear algebra
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Journal of Computational Physics
Algorithms for large-scale whole genome association analysis
Proceedings of the 20th European MPI Users' Group Meeting
SE-HPCCSE '13 Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering
A case study in mechanically deriving dense linear algebra code
International Journal of High Performance Computing Applications
Hi-index | 0.01 |
Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since the traditional MPI-based approaches will likely need to be extended. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters.