Basic Linear Algebra Subprograms for Fortran Usage

Authors:
C. L. Lawson;R. J. Hanson;D. R. Kincaid;F. T. Krogh
Affiliations:
Jet Propulsion Laboratory, M/S 125-128, 4800 Oak Grove Drive, Pasadena, CA;Numerical Mathematics, Div. 5122, Sandia Laboratories, Albuquerque, NM;Center for Numerical Analysis, The University of Texas at Austin, Austin, TX;Jet Propulsion Laboratory, M/S 125-128, 4800 Oak Grove Drive, Pasadena, CA
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
1979

Citing 4
Cited 254

A Portable Fortran Program to Find the Euclidean Norm of a Vector

ACM Transactions on Mathematical Software (TOMS)
A Fortran Multiple-Precision Arithmetic Package

ACM Transactions on Mathematical Software (TOMS)
Clarification of Fortran standards—second report

Communications of the ACM
Basic Linear Algebra Subprograms for FORTRAN Usage

Basic Linear Algebra Subprograms for FORTRAN Usage

Procedures for optimization problems with a mixture of bounds and general linear constraints

ACM Transactions on Mathematical Software (TOMS)
Transforming FORTRAN DO loops to improve performance on vector architectures

ACM Transactions on Mathematical Software (TOMS)
A proposal for a set of level 3 basic linear algebra subprograms

ACM SIGNUM Newsletter
An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Solution of large, dense symmetric generalized eigenvalue problems using secondary storage

ACM Transactions on Mathematical Software (TOMS)
Performance of various computers using standard linear equations software in a FORTRAN environment

ACM SIGARCH Computer Architecture News
Algorithm 663: Translation of Algorithm 539: basic linear algebra subprograms for FORTRAN usage in FORTRAN 200 for the Cyber 205

ACM Transactions on Mathematical Software (TOMS)
Algorithm 666: Chabis: a mathematical software package for locating and evaluating roots of systems of nonlinear equations

ACM Transactions on Mathematical Software (TOMS)
Engineering and scientific subroutine library for the IBM 3090 vector facility

IBM Systems Journal
Programming style on the IBM 3090 vector facility considering both performance ad flexibility

IBM Systems Journal
Object-oriented programming for linear algebra

OOPSLA '89 Conference proceedings on Object-oriented programming systems, languages and applications
Interprocessor communication speed and performance in distributed-memory parallel processors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Matrix multiplication on the connection machine

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Algorithm 676: ODRPACK: software for weighted orthogonal distance regression

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Algorithm 686: FORTRAN subroutines for updating the QR decomposition

ACM Transactions on Mathematical Software (TOMS)
Program optimization and parallelization using idioms

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Sparse extensions to the FORTRAN Basic Linear Algebra Subprograms

ACM Transactions on Mathematical Software (TOMS)
Algorithm 692: Model implementation and test package for the Sparse Basic Linear Algebra Subprograms

ACM Transactions on Mathematical Software (TOMS)
LAPACK: a portable linear algebra library for high-performance computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Hierarchical blocking and data flow analysis for numerical linear algebra

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The impact of memory organization on the performance of matrix multiplication

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Scan primitives for vector computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Size and access inference for data-parallel programs

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A new approach for automatic parallelization of blocked linear Algebra computations

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The K2 distributed memory parallel processor: architecture, compiler, and operating system

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
FORTRAN subroutines for general Toeplitz systems

ACM Transactions on Mathematical Software (TOMS)
An implementation of a divide and conquer algorithm for the unitary eigen problem

ACM Transactions on Mathematical Software (TOMS)
LSNNO, a FORTRAN subroutine for solving large-scale nonlinear network optimization problems

ACM Transactions on Mathematical Software (TOMS)
Performance of various computers using standard linear equations software

ACM SIGARCH Computer Architecture News
Evaluation of compiler generated parallel programs on three multicomputers

ICS '92 Proceedings of the 6th international conference on Supercomputing
Automatic data mapping for distributed-memory parallel computers

ICS '92 Proceedings of the 6th international conference on Supercomputing
Algorithm 718: A FORTRAN subroutine to solve the eigenvalue allocation problem for single-input systems

ACM Transactions on Mathematical Software (TOMS)
The role of APL and J in high-performance computation

APL '93 Proceedings of the international conference on APL
Toward parallel mathematical software for elliptic partial differential equations

ACM Transactions on Mathematical Software (TOMS)
Algorithm 728: FORTRAN subroutines for generating quadratic bilevel programming test problems

ACM Transactions on Mathematical Software (TOMS)
A parallel block implementation of Level-3 BLAS for MIMD vector processors

ACM Transactions on Mathematical Software (TOMS)
Program optimization and parallelization using idioms

ACM Transactions on Programming Languages and Systems (TOPLAS)
Conversion to Fortran 90: a case study

ACM Transactions on Mathematical Software (TOMS)
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Algorithm 737: INTLIB—a portable Fortran 77 interval standard-function library

ACM Transactions on Mathematical Software (TOMS)
Algorithm 741: least-squares solution of a linear, bordered, block-diagonal system of equations

ACM Transactions on Mathematical Software (TOMS)
Fast floating-point processing in Common Lisp

ACM Transactions on Mathematical Software (TOMS)
Algorithm 747: a Fortran subroutine to solve the eigenvalue assignment problem for multiinput systems using state feedback

ACM Transactions on Mathematical Software (TOMS)
Algorithm 640: Efficient calculation of frequency response matrices from state space models

ACM Transactions on Mathematical Software (TOMS) - The MIT Press scientific computation series
Algorithm 653: Translation of algorithm 539: PC-BLAS, basic linear algebra subprograms for FORTRAN usage with the INTEL 8087, 80287 numeric data processor

ACM Transactions on Mathematical Software (TOMS)
FORTRAN codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation

ACM Transactions on Mathematical Software (TOMS)
The design of a new frontal code for solving sparse, unsymmetric systems

ACM Transactions on Mathematical Software (TOMS)
The design of MA48: a code for the direct solution of sparse unsymmetric linear systems of equations

ACM Transactions on Mathematical Software (TOMS)
Exploiting zeros on the diagonal in the direct solution of indefinite sparse symmetric linear systems

ACM Transactions on Mathematical Software (TOMS)
Parallel reduction of banded matrices to bidiagonal form

Parallel Computing
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Algorithm 767: a Fortran 77 package for column reduction of polynomial matrices

ACM Transactions on Mathematical Software (TOMS)
Use of parallel level 3 BLAS in LU factorization on three vector multiprocessors the ALLIANT FX/80, the CRAY-2, and the IBM 3090 VF

ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Practical experience in the numerical dangers of heterogeneous computing

ACM Transactions on Mathematical Software (TOMS)
CALYPSO: a computer algebra library for parallel symbolic computation

PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Compiler blockability of dense matrix factorizations

ACM Transactions on Mathematical Software (TOMS)
Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface

ACM Transactions on Mathematical Software (TOMS)
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
The automatic generation of sparse primitives

ACM Transactions on Mathematical Software (TOMS)
Restructuring the BLAS level 1 routine for computing the modified givens transformation

ACM SIGNUM Newsletter
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark

ACM Transactions on Mathematical Software (TOMS)
Performance comparisons of Cholesky factorization algorithms using level-2 & 3 BLAS on the national advanced systems AS/XL Vector computer

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Portable and efficient factorization algorithms on the IBM 3090/VF

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Squeezing the most out of an algorithm in CRAY FORTRAN

ACM Transactions on Mathematical Software (TOMS)
Matrix multiplication in an interleaved array processing architecture

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Direct numerical simulation of turbulence with a PC/linux cluster: fact or fiction?

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive Processing

IEEE Transactions on Parallel and Distributed Systems
Algorithm 800: Fortran 77 subroutines for computing the eigenvalues of Hamiltonian matrices. I: the square-reduced method

ACM Transactions on Mathematical Software (TOMS)
OoLALA: an object oriented analysis and design of numerical linear algebra

OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1]

ACM Transactions on Mathematical Software (TOMS)
Linearly Constrained Discrete I1 Problems

ACM Transactions on Mathematical Software (TOMS)
Algorithm 576: A FORTRAN Program for Solving Ax=b[F4]

ACM Transactions on Mathematical Software (TOMS)
Algorithm 580: QRUP: A Set of FORTRAN Routines for Updating QR Factorizations [F5]

ACM Transactions on Mathematical Software (TOMS)
Algorithm 586: ITPACK 2C: A FORTRAN Package for Solving Large Sparse Linear Systems by Adaptive Accelerated Iterative Methods

ACM Transactions on Mathematical Software (TOMS)
Algorithm 587: Two Algorithms for the Linearly Constrained Least Squares Problem

ACM Transactions on Mathematical Software (TOMS)
Algorithm 589: SICEDR: A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues

ACM Transactions on Mathematical Software (TOMS)
Remark on “Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage”

ACM Transactions on Mathematical Software (TOMS)
Algorithm 596: a program for a locally parameterized

ACM Transactions on Mathematical Software (TOMS)
A mathematical programming updating method using modified Givens transformations and applied to LP problems

Communications of the ACM
PSBLAS: a library for parallel linear algebra computation on sparse matrices

ACM Transactions on Mathematical Software (TOMS)
ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
NetSolve: a network server for solving computational science problems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatic translation of Fortran to JVM bytecode

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
A graphical tool for driving the parallel computation of pseudosprectra

ICS '01 Proceedings of the 15th international conference on Supercomputing
A recursive formulation of Cholesky factorization of a matrix in packed storage

ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
Optimization of a parallel ocean general circulation model

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
An updated set of basic linear algebra subprograms (BLAS)

ACM Transactions on Mathematical Software (TOMS)
Design, implementation and testing of extended and mixed precision BLAS

ACM Transactions on Mathematical Software (TOMS)
On computing givens rotations reliably and efficiently

ACM Transactions on Mathematical Software (TOMS)
Algorithm 818: A reference model implementation of the sparse BLAS in fortran 95

ACM Transactions on Mathematical Software (TOMS)
Preface to the special issue on the basic linear algebra subprograms (BLAS)

ACM Transactions on Mathematical Software (TOMS)
Remark on algorithm 705: A Fortran-77 software package for solving the Sylvester matrix equation AXBT + CXDT = E

ACM Transactions on Mathematical Software (TOMS)
Generic programming for high performance scientific applications

JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
Automatic Intra-Register Vectorization for the Intel® Architecture

International Journal of Parallel Programming
Linear Algebra Libraries for High-Performance Computers: A Personal Perspective

IEEE Parallel & Distributed Technology: Systems & Technology
The Matrix Template Library: Generic Components for High-Performance Scientific Computing

Computing in Science and Engineering
The Decompositional Approach to Matrix Computation

Computing in Science and Engineering
Faster Numerical Algorithms Via Exception Handling

IEEE Transactions on Computers
An object-oriented programming of an explicit dynamics code: application to impact simulation

Advances in Engineering Software
Statistical Models for Automatic Performance Tuning

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
The Design of a Parallel Adaptive Multi-level Code in Fortran 90

ICCS '02 Proceedings of the International Conference on Computational Science-Part III
A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Recursive Formulation of the Inversion of Symmetric Positive Definite Matrices in Packed Storage Data Format

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW

SAIG '00 Proceedings of the International Workshop on Semantics, Applications, and Implementation of Program Generation
The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
An Evaluation of Java for Numerical Computing

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
HPF and Numerical Libraries

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Blocking Techniques in Numerical Software

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Expressing Irregular Computations in Modern Fortran Dialects

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Performance Study on a Single Processing Node of the HITACHI SR8000

NAA '00 Revised Papers from the Second International Conference on Numerical Analysis and Its Applications
Advanced environments for parallel and distributed applications: a view of current status

Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
Formal derivation of algorithms: The triangular sylvester equation

ACM Transactions on Mathematical Software (TOMS)
NetSolve: A Network-Enabled Solver: Examples and Users

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Performance of various computers using standard linear equations software in a Fortran environment

ACM SIGARCH Computer Architecture News
Mathematical software: past, present, and future

Computational science, mathematics and software
Numerical algorithm delivery mechanisms

Computational science, mathematics and software
References

Sourcebook of parallel computing
Algorithm 830: Another visit with standard and modified givens transformations and a remark on algorithm 539

ACM Transactions on Mathematical Software (TOMS)
High-performance linear algebra algorithms using new generalized data structures for matrices

IBM Journal of Research and Development
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
High-performance numerical algorithms and software for subspace-based linear multivariable system identification

Journal of Computational and Applied Mathematics
Development of an object-oriented finite element program: application to metal-forming and impact simulations

Journal of Computational and Applied Mathematics - Special issue: Selected papers from the 2nd international conference on advanced computational methods in engineering (ACOMEN2002) Liege University, Belgium, 27-31 May 2002
A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Semi-formal design of reliable mesh generation systems

Advances in Engineering Software
Newton-Krylov continuation of periodic orbits for Navier-Stokes flows

Journal of Computational Physics
Supporting Cluster-Based Network Services on Functionally Symmetric Software Architecture

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Reducing Power with Performance Constraints for Parallel Sparse Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
The science of deriving dense linear algebra algorithms

ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces

ACM Transactions on Mathematical Software (TOMS)
Parallel out-of-core computation and updating of the QR factorization

ACM Transactions on Mathematical Software (TOMS)
Impact of the proposed IEEE floating point standard on numerical software

ACM SIGNUM Newsletter
The SLATEC mathematical subroutine library

ACM SIGNUM Newsletter
Performance of various computers using standard linear equations software in a Fortran environment

ACM SIGNUM Newsletter
A proposal for an extended set of Fortran Basic Linear Algebra Subprograms

ACM SIGNUM Newsletter
Issues relating to extension of the Basic Linear Algebra Subprograms

ACM SIGNUM Newsletter
Proposed sparse extensions to the Basic Linear Algebra Subprograms

ACM SIGNUM Newsletter
Mathematical software at KFA

ACM SIGNUM Newsletter
Programming tools for linear algebra

ACM SIGNUM Newsletter
A framework for adaptive algorithm selection in STAPL

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A fully portable high performance minimal storage hybrid format Cholesky algorithm

ACM Transactions on Mathematical Software (TOMS)
A Neural Syntactic Language Model

Machine Learning
CONDOR, a new parallel, constrained extension of Powell's UOBYQA algorithm: experimental results and comparison with the DFO algorithm

Journal of Computational and Applied Mathematics
High Performance Linear Algebra Operations on Reconfigurable Systems

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Design patterns and Fortran 90/95

ACM SIGPLAN Fortran Forum
SmartApps: middle-ware for adaptive applications on reconfigurable platforms

ACM SIGOPS Operating Systems Review
A friendly Fortran DDE solver

Applied Numerical Mathematics - The third international conference on the numerical solutions of volterra and delay equations, May 2004, Tempe, AZ
Mondriaan sparse matrix partitioning for attacking cryptosystems by a parallel block Lanczos algorithm: a case study

Parallel Computing - Algorithmic skeletons
An evaluation of Java for numerical computing

Scientific Programming
JLAPACK - compiling LAPACK Fortran to Java

Scientific Programming
Irregular computations in Fortran - expression and implementation strategies

Scientific Programming
Quantitative performance analysis of the SPEC OMPM2001 benchmarks

Scientific Programming - OpenMP
Design patterns for library optimization

Scientific Programming - POOSC '01 Workshop
BLASTH, a BLAS library for dual SMP computer

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Algorithm 867: QUADLOG—a package of routines for generating Gauss-related quadrature for two classes of logarithmic weight functions

ACM Transactions on Mathematical Software (TOMS)
Parallel Languages and Compilers: Perspective From the Titanium Experience

International Journal of High Performance Computing Applications
Performance of various computers using standard linear equations software in a Fortran environment

ACM SIGARCH Computer Architecture News
A highly efficient implementation of back propagation algorithm using matrix instruction set architecture

Neural, Parallel & Scientific Computations
Scalable parallelization of FLAME code via the workqueuing model

ACM Transactions on Mathematical Software (TOMS)
High performance BLAS formulation of the multipole-to-local operator in the fast multipole method

Journal of Computational Physics
Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A highly efficient implementation of a backpropagation learning algorithm using matrix ISA

Journal of Parallel and Distributed Computing
An efficient hybrid MLFMA-FFT solver for the volume integral equation in case of sparse 3D inhomogeneous dielectric scatterers

Journal of Computational Physics
The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate

ACM Transactions on Mathematical Software (TOMS)
A simulator for adaptive parallel applications

Journal of Computer and System Sciences
Pattern-Driven Automatic Parallelization

Scientific Programming
Dynamic Supernodes in Sparse Cholesky Update/Downdate and Triangular Solves

ACM Transactions on Mathematical Software (TOMS)
The Mailman algorithm: A note on matrix--vector multiplication

Information Processing Letters
Design for Interoperability in stapl: pMatrices and Linear Algebra Algorithms

Languages and Compilers for Parallel Computing
Adaptive Winograd's matrix multiplications

ACM Transactions on Mathematical Software (TOMS)
Solving dense linear systems on platforms with multiple hardware accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Petascale computing with accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
LAPACK-Based Condition Estimates for the Discrete-Time LQG Design

Numerical Analysis and Its Applications
Programming the Linpack benchmark for the IBM PowerXCell 8i processor

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Anasazi software for the numerical solution of large-scale eigenvalue problems

ACM Transactions on Mathematical Software (TOMS)
Programming matrix algorithms-by-blocks for thread-level parallelism

ACM Transactions on Mathematical Software (TOMS)
C++ Bindings to External Software Libraries with Examples from BLAS, LAPACK, UMFPACK, and MUMPS

ACM Transactions on Mathematical Software (TOMS)
Streamlining Offload Computing to High Performance Architectures

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software

Cluster Computing
Computational tools for the analysis of spatial patterns of gene expression in Common Lisp

Proceedings of the 2007 International Lisp Conference
On the Need for a Consortium of Capability Centers

International Journal of High Performance Computing Applications
A friendly Fortran DDE solver

Applied Numerical Mathematics
ScaLAPACK's MRRR algorithm

ACM Transactions on Mathematical Software (TOMS)
A message-passing hardware/software cosimulation environment for reconfigurable computing systems

International Journal of Reconfigurable Computing - Special issue on selected papers from ReConFig 2008
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Replacing square roots by Pythagorean sums

IBM Journal of Research and Development
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L

IBM Journal of Research and Development
Scaling LAPACK panel operations using parallel cache assignment

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Investigating the properties of optimal sensory and motor synergies in a nonlinear model of arm dynamics

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
CONDOR, a new parallel, constrained extension of Powell's UOBYQA algorithm: Experimental results and comparison with the DFO algorithm

Journal of Computational and Applied Mathematics
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion

ACM Transactions on Mathematical Software (TOMS)
A collection of parallel linear equations routines for the Denelcor HEP

Parallel Computing
The impact of memory organization on the performance of matrix calculations

Parallel Computing
The performance of the BLAS and LAPACK on a shared memory scalar multiprocessor

Parallel Computing
Paper: Toward a better parallel performance metric

Parallel Computing
On the performance of transputer networks for solving linear systems of equations

Parallel Computing
Self-adapting numerical software and automatic tuning of heuristics

ICCS'03 Proceedings of the 2003 international conference on Computational science
Self-adapting numerical software and automatic tuning of heuristics

ICCS'03 Proceedings of the 2003 international conference on Computational science
Minimal data copy for dense linear algebra factorization

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Performance evaluation of basic linear algebra subroutines on a matrix co-processor

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Implementing and optimizing a data-intensive hydrodynamics application on the stream processor

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Composing parallel software efficiently with lithe

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Programming the Linpack benchmark for Roadrunner

IBM Journal of Research and Development
Optimization of triangular matrix functions in BLAS library on Loongson2F

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Using hybrid CPU-GPU platforms to accelerate the computation of the matrix sign function

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
The general matrix multiply-add operation on 2D torus

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A simulator for parallel applications with dynamically varying compute node allocation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Solving Very Sparse Rational Systems of Equations

ACM Transactions on Mathematical Software (TOMS)
Simple optimizations for an applicative array language for graphics processors

Proceedings of the sixth workshop on Declarative aspects of multicore programming
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation

Science of Computer Programming
Exact solutions to linear systems of equations using output sensitive lifting

ACM Communications in Computer Algebra
Projected sequential Gaussian processes: A C++ tool for interpolation of large datasets with heterogeneous noise

Computers & Geosciences
Solving dense interval linear systems with verified computing on multicore architectures

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Modeling and predicting the efficiency of application execution in distributed environments

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Improving CSE software through reproducibility requirements

Proceedings of the 4th International Workshop on Software Engineering for Computational Science and Engineering
Numerical Python for scalable architectures

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
A domain-decomposing parallel sparse linear system solver

Journal of Computational and Applied Mathematics
Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: Matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation

ACM Transactions on Mathematical Software (TOMS)
Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The Combinatorial BLAS: design, implementation, and applications

International Journal of High Performance Computing Applications
Conditioning and error estimation in the numerical solution of matrix riccati equations

NAA'04 Proceedings of the Third international conference on Numerical Analysis and its Applications
Parallelising matrix operations on clusters for an optimal control-based quantum compiler

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Deciding where to call performance libraries

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A matrix-type for performance–portability

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Rapid development of high-performance linear algebra libraries

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Semi-automatic generation of grid computing interfaces for numerical software libraries

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Parallelization of general matrix multiply routines using OpenMP

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
On domain-specific languages reengineering

GPCE'05 Proceedings of the 4th international conference on Generative Programming and Component Engineering
Data mining with parallel support vector machines for classification

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
MadLINQ: large-scale distributed matrix computation for the cloud

Proceedings of the 7th ACM european conference on Computer Systems
A generalization of s-step variants of gradient methods

Journal of Computational and Applied Mathematics
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Foundations and Trends® in Machine Learning
Vectorizing codes for studying long-range transport of air pollutants

Mathematical and Computer Modelling: An International Journal
Running air pollution models on the connection machine

Mathematical and Computer Modelling: An International Journal
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures

Concurrency and Computation: Practice & Experience
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations

Journal of Parallel and Distributed Computing
GPU-based parallel algorithms for sparse nonlinear systems

Journal of Parallel and Distributed Computing
Programming many-core architectures - a case study: dense matrix computations on the Intel single-chip cloud computer processor

Concurrency and Computation: Practice & Experience
Generalizing matrix multiplication for efficient computations on modern computers

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Modeling performance through memory-stalls

ACM SIGMETRICS Performance Evaluation Review
Families of Algorithms for Reducing a Matrix to Condensed Form

ACM Transactions on Mathematical Software (TOMS)
Efficiently combining parallel software using fine-grained, language-level, hierarchical resource management policies

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Parallelizing dense linear algebra operations with task queues in llc

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Expressing graph algorithms using generalized active messages

Proceedings of the 27th international ACM conference on International conference on supercomputing
Scaling LAPACK panel operations using parallel cache assignment

ACM Transactions on Mathematical Software (TOMS)
Cache efficient implementation for block matrix operations

Proceedings of the High Performance Computing Symposium
Discrete adjoints of PETSc through dco/c++ and adjoint MPI

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
A case study in mechanically deriving dense linear algebra code

International Journal of High Performance Computing Applications
Trends and outlook for the massive-scale analytics stack

IBM Journal of Research and Development

Quantified Score

Hi-index	0.04

Basic Linear Algebra Subprograms for Fortran Usage

Quantified Score

Visualization

Abstract