An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
LAPACK's user's guide
Generalizations of the singular value and QR decompositions
SIAM Journal on Matrix Analysis and Applications
The high performance Fortran handbook
The high performance Fortran handbook
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Modeling the benefits of mixed data and task parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Matrix computations (3rd ed.)
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
MPI: The Complete Reference
Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I
Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I
LAPACK Working Note 55: ScaLAPACK: A Scalable Linear Algebra Library for Distributed Memory Concurrent Computers
LAPACK Working Note 58: ``The Design of Linear Algebra Libraries for High Performance Computers
LAPACK Working Note 58: ``The Design of Linear Algebra Libraries for High Performance Computers
LAPACK Working Note 65: Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers
LAPACK Working Note 91: The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers
LAPACK Working Note 94: A User''s Guide to the BLACS v1.0
LAPACK Working Note 94: A User''s Guide to the BLACS v1.0
LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
LAPACK Working Note 112: Practical Experience in the Dangers ofHeterogeneous Computing
LAPACK Working Note 112: Practical Experience in the Dangers ofHeterogeneous Computing
Distributed data structure design for scientific computation
ICS '98 Proceedings of the 12th international conference on Supercomputing
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
IEEE Transactions on Computers
A comparison of automatic parallelization tools/compilers on the SGI origin 2000
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MultiMATLAB: integrating MATLAB with high-performance parallel computing
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Portable and scalable algorithm for irregular all-to-all communication
Journal of Parallel and Distributed Computing
Performance Contracts: Predicting and Monitoring Grid Application Behavior
GRID '01 Proceedings of the Second International Workshop on Grid Computing
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Development of large scale high performance applications with a parallelizing compiler
Practical parallel computing
On performance analysis of heterogeneous parallel algorithms
Parallel Computing
PyTrilinos: High-performance distributed-memory solvers for Python
ACM Transactions on Mathematical Software (TOMS)
Block size selection of parallel LU and QR on PVP-based and RISC-based supercomputers
CHINA HPC '07 Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimal real number codes for fault tolerant matrix operations
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Algorithmic issues in grid computing
Algorithms and theory of computation handbook
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing matrix transpose on torus interconnects
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The impact of data distribution in accuracy and performance of parallel linear algebra subroutines
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Parallel multivariate slice sampling
Statistics and Computing
Numerical Python for scalable architectures
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Mapping applications with collectives over sub-communicators on torus networks
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Leakage energy estimates for HPC applications
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Hi-index | 0.00 |
This paper outlines the content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers. The importance of developing standards for computational and message passing interfaces is discussed. We present the different components and building blocks of ScaLAPACK, and indicate the difficulties inherent in producing correct codes for networks of heterogeneous processors. Finally, this paper briefly describes future directions for the ScaLAPACK library and concludes by suggesting alternative approaches to mathematical libraries, explaining how ScaLAPACK could be integrated into efficient and user-friendly distributed systems.