An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Matrix multiplication via arithmetic progressions
Journal of Symbolic Computation - Special issue on computational algebraic complexity
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Approximation algorithms for scheduling unrelated parallel machines
Mathematical Programming: Series A and B
Using profile information to assist classic code optimizations
Software—Practice & Experience
A high performance algorithm using pre-processing for the sparse matrix-vector multiplication
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Block-cyclic dense linear algebra
SIAM Journal on Scientific Computing
Advanced compiler optimizations for sparse computations
Journal of Parallel and Distributed Computing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Algorithmic bombardment for the iterative solution of linear systems: a poly-iterative approach
Journal of Computational and Applied Mathematics - Special issue on TICAM symposium
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Applied numerical linear algebra
Applied numerical linear algebra
ScaLAPACK user's guide
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems
Theoretical Computer Science
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Future Generation Computer Systems - Special issue on metacomputing
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The Mythical Man-Month: Essays on Softw
The Mythical Man-Month: Essays on Softw
Numerical Linear Algebra for High Performance Computers
Numerical Linear Algebra for High Performance Computers
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Dense linear algebra kernels on heterogeneous platforms: redistribution issues
Parallel Computing - Parallel matrix algorithms and applications
Automatic Performance Tuning in the UHFFT Library
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Strong Lower Bounds on the Approximability of some NPO PB-Complete Maximization Problems
MFCS '95 Proceedings of the 20th International Symposium on Mathematical Foundations of Computer Science
Approximation Algorithms for Dynamic Storage Allocations
ESA '96 Proceedings of the Fourth Annual European Symposium on Algorithms
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
LAPACK Working Note 78: Computational Variants of the CGS and BiCGstab Methods
LAPACK Working Note 78: Computational Variants of the CGS and BiCGstab Methods
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
TOP500 Supercomputer Sites
Algorithmic redistribution methods for block cyclic decompositions
Algorithmic redistribution methods for block cyclic decompositions
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
The GrADS Project: Software Support for High-Level Grid Application Development
International Journal of High Performance Computing Applications
Numerical Libraries and the Grid
International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Architecture of an automatically tuned linear algebra library
Parallel Computing
Fault tolerant high performance computing by a coding approach
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel Computing - Heterogeneous computing
High Performance Development for High End Computing With Python Language Wrapper (PLW)
International Journal of High Performance Computing Applications
Designing polylibraries to speed up linear algebra computations
International Journal of High Performance Computing and Networking
Hardware-accelerated components for hybrid computing systems
Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
Adaptive approaches for efficient parallel algorithms on cluster-based systems
International Journal of Grid and Utility Computing
Interfaces for parallel numerical linear algebra libraries in high level languages
Advances in Engineering Software
PyACTS: a python based interface to ACTS tools and parallel scientific applications
International Journal of Parallel Programming
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Broadcast-Based parallel LU factorization
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning
International Journal of Parallel Programming
Hi-index | 0.00 |
This article describes the context, design, and recent development of the LAPACK for clusters (LFC) project. It has been developed in the framework of Self-Adapting Numerical Software (SANS) since we believe such an approach can deliver the convenience and ease of use of existing sequential environments bundled with the power and versatility of highly tuned parallel codes that execute on clusters. Accomplishing this task is far from trivial as we argue in the paper by presenting pertinent case studies and possible usage scenarios.