Self-adapting software for numerical linear algebra and LAPACK for clusters

Authors:
Zizhong Chen;Jack Dongarra;Piotr Luszczek;Kenneth Roche
Affiliations:
Computer Science Department, Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd., Suite 203, Knoxville, TN;Computer Science Department, Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd., Suite 203, Knoxville, TN;Computer Science Department, Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd., Suite 203, Knoxville, TN;Computer Science Department, Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd., Suite 203, Knoxville, TN
Venue:
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Year:
2003

Citing 45
Cited 13

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach

SIAM Journal on Computing
A practical termination criterion for the conjugate gradient method

BIT
Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
Matrix multiplication via arithmetic progressions

Journal of Symbolic Computation - Special issue on computational algebraic complexity
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Approximation algorithms for scheduling unrelated parallel machines

Mathematical Programming: Series A and B
Using profile information to assist classic code optimizations

Software—Practice & Experience
A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Block-cyclic dense linear algebra

SIAM Journal on Scientific Computing
Advanced compiler optimizations for sparse computations

Journal of Parallel and Distributed Computing
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Algorithmic bombardment for the iterative solution of linear systems: a poly-iterative approach

Journal of Computational and Applied Mathematics - Special issue on TICAM symposium
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Applied numerical linear algebra

Applied numerical linear algebra
ScaLAPACK user's guide

ScaLAPACK user's guide
Locality of Reference in LU Decomposition with Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Recursion leads to automatic variable blocking for dense linear-algebra algorithms

IBM Journal of Research and Development
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems

Theoretical Computer Science
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The Mythical Man-Month: Essays on Softw

The Mythical Man-Month: Essays on Softw
Numerical Linear Algebra for High Performance Computers

Numerical Linear Algebra for High Performance Computers
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Dense linear algebra kernels on heterogeneous platforms: redistribution issues

Parallel Computing - Parallel matrix algorithms and applications
Automatic Performance Tuning in the UHFFT Library

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems

PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Strong Lower Bounds on the Approximability of some NPO PB-Complete Maximization Problems

MFCS '95 Proceedings of the 20th International Symposium on Mathematical Foundations of Computer Science
Approximation Algorithms for Dynamic Storage Allocations

ESA '96 Proceedings of the Fourth Annual European Symposium on Algorithms
A framework for performance modeling and prediction

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
LAPACK Working Note 78: Computational Variants of the CGS and BiCGstab Methods

LAPACK Working Note 78: Computational Variants of the CGS and BiCGstab Methods
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms

A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
TOP500 Supercomputer Sites

TOP500 Supercomputer Sites
Algorithmic redistribution methods for block cyclic decompositions

Algorithmic redistribution methods for block cyclic decompositions
Optimizing the performance of sparse matrix-vector multiplication

Optimizing the performance of sparse matrix-vector multiplication
The GrADS Project: Software Support for High-Level Grid Application Development

International Journal of High Performance Computing Applications
Numerical Libraries and the Grid

International Journal of High Performance Computing Applications
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming

Architecture of an automatically tuned linear algebra library

Parallel Computing
Fault tolerant high performance computing by a coding approach

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Heuristics for work distribution of a homogeneous parallel dynamic programming scheme on heterogeneous systems

Parallel Computing - Heterogeneous computing
High Performance Development for High End Computing With Python Language Wrapper (PLW)

International Journal of High Performance Computing Applications
Designing polylibraries to speed up linear algebra computations

International Journal of High Performance Computing and Networking
Hardware-accelerated components for hybrid computing systems

Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
Adaptive approaches for efficient parallel algorithms on cluster-based systems

International Journal of Grid and Utility Computing
Interfaces for parallel numerical linear algebra libraries in high level languages

Advances in Engineering Software
PyACTS: a python based interface to ACTS tools and parallel scientific applications

International Journal of Parallel Programming
Algorithmic-Parameter optimization of a parallelized split-step fourier transform using a modified BSP cost model

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Broadcast-Based parallel LU factorization

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modeling

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article describes the context, design, and recent development of the LAPACK for clusters (LFC) project. It has been developed in the framework of Self-Adapting Numerical Software (SANS) since we believe such an approach can deliver the convenience and ease of use of existing sequential environments bundled with the power and versatility of highly tuned parallel codes that execute on clusters. Accomplishing this task is far from trivial as we argue in the paper by presenting pertinent case studies and possible usage scenarios.