LAPACK: a portable linear algebra library for high-performance computers

Authors:
E. Anderson;Z. Bai;J. Dongarra;A. Greenbaum;A. McKenney;J. Du Croz;S. Hammerling;J. Demmel;C. Bischof;D. Sorensen
Affiliations:
University of Tennessee;University of Tennessee;University of Tennessee;New York University;New York University;NAG Ltd.;NAG Ltd.;University of California, Berkeley;Argonne National Laboratory;Rice University
Venue:
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Year:
1990

Citing 14
Cited 40

Distribution of mathematical software via electronic mail

Communications of the ACM
High-performance computer architecture

High-performance computer architecture
The WY representation for products of householder matrices

SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
A fully parallel algorithm for the symmetric eigenvalue problem

SIAM Journal on Scientific and Statistical Computing
An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Reducing Contention in Shared-Memory Multiprocessors

Computer
The algebraic eigenvalue problem

The algebraic eigenvalue problem
A storage-efficient WY representation for products of householder transformations

SIAM Journal on Scientific and Statistical Computing
Computing accurate eigensystems of scaled diagonally dominant matrices

SIAM Journal on Numerical Analysis
Parallel algorithms for dense linear algebra computations

SIAM Review
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Accurate eigenvalues of a symmetric tri-diagonal matrix

Accurate eigenvalues of a symmetric tri-diagonal matrix

Controlling and sequencing a heavily pipelined floating-point operator

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comparison of parallel programming paradigms and data distributions for a limited area numerical weather forecast routine

ICS '95 Proceedings of the 9th international conference on Supercomputing
Distributed component architecture for scientific applications

CRPIT '02 Proceedings of the Fortieth International Conference on Tools Pacific: Objects for internet, mobile and embedded applications
The Matrix Template Library: Generic Components for High-Performance Scientific Computing

Computing in Science and Engineering
Parallel Factorizations with Algorithmic Blocking

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Compiling MATLAB Programs to ScaLAPACK: Exploiting Task and Data Parallelism

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Heterogeneous Networks of Workstations and the Parallel Matrix Multiplication

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
A Simulation and Decision Framework for Selection of Numerical Solvers in

ANSS '06 Proceedings of the 39th annual Symposium on Simulation
An operation stacking framework for large ensemble computations

Proceedings of the 21st annual international conference on Supercomputing
Effective and scalable software compatibility testing

ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming
High-performance technical computing with erlang

Proceedings of the 7th ACM SIGPLAN workshop on ERLANG
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
On the Need for a Consortium of Capability Centers

International Journal of High Performance Computing Applications
Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Fast tridiagonal solvers on the GPU

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Operation Stacking for Ensemble Computations With Variable Convergence

International Journal of High Performance Computing Applications
PSF - A Retrospective

Fundamenta Informaticae - Understanding Computers' Intelligence Celebrating the 100th Volume of Fundamenta Informaticae in Honour of Helena Rasiowa
Quadratic Programming Feature Selection

The Journal of Machine Learning Research
LU decomposition on cell broadband engine: an empirical study to exploit heterogeneous chip multiprocessors

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Point Cloud Glue: constraining simulations using the procrustes transform

Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
ULCC: a user-level facility for optimizing shared cache performance on multicores

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Mesos: a platform for fine-grained resource sharing in the data center

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Numerical Python for scalable architectures

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
A parallel code for time independent quantum reactive scattering on CPU-GPU platforms

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
MadLINQ: large-scale distributed matrix computation for the cloud

Proceedings of the 7th ACM european conference on Computer Systems
Parallel programming: design of an overview class

Proceedings of the 2011 ACM SIGPLAN X10 Workshop
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs

Proceedings of the 26th ACM international conference on Supercomputing
Modeling performance through memory-stalls

ACM SIGMETRICS Performance Evaluation Review
Parallelized matrix factorization for fast BTF compression

EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
Decomposition and visualization of fourth-order elastic-plastic tensors

SPBG'08 Proceedings of the Fifth Eurographics / IEEE VGTC conference on Point-Based Graphics
A communication-efficient linear system solver for large eddy simulation of jet engine noise

Cluster Computing
Numprof: a performance analysis framework for numerical libraries

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Towards effective clustering techniques for the analysis of electric power grids

HiPCNA-PG '13 Proceedings of the 3rd International Workshop on High Performance Computing, Networking and Analytics for the Power Grid

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of the LAPACK project is to design and implement a portable linear algebra library for efficient use on a variety of high-performance computers. The library is based on the widely used LINPACK and EISPACK packages for solving linear equations, eigenvalue problems, and linear least-squares problems, but extends their functionality in a number of ways. The major methodology for making the algorithms run faster is to restructure them to perform block matrix operations (e.g., matrix-matrix multiplication) in their inner loops. These block operations may be optimized to exploit the memory hierarchy of a specific architecture. The LAPACK project is also working on new algorithms that yield higher relative accuracy for a variety of linear algebra problems.