Increasing the performance of mathematical software through high-level modularity
Proc. of the sixth int'l. symposium on Computing methods in applied sciences and engineering, VI
ACM Transactions on Mathematical Software (TOMS)
Squeezing the most out of an algorithm in CRAY FORTRAN
ACM Transactions on Mathematical Software (TOMS)
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1]
ACM Transactions on Mathematical Software (TOMS)
Improving the efficiency of portable software for linear algebra
ACM SIGNUM Newsletter
A proposal for an extended set of Fortran Basic Linear Algebra Subprograms
ACM SIGNUM Newsletter
Issues relating to extension of the Basic Linear Algebra Subprograms
ACM SIGNUM Newsletter
ACM Transactions on Mathematical Software (TOMS)
Engineering and scientific subroutine library for the IBM 3090 vector facility
IBM Systems Journal
ACM Transactions on Mathematical Software (TOMS)
A block QR factorization algorithm using restricted pivoting
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Program optimization and parallelization using idioms
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Sparse extensions to the FORTRAN Basic Linear Algebra Subprograms
ACM Transactions on Mathematical Software (TOMS)
LAPACK: a portable linear algebra library for high-performance computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Hierarchical blocking and data flow analysis for numerical linear algebra
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The impact of memory organization on the performance of matrix multiplication
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Parallel algorithm research at CERFACS
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A new approach for automatic parallelization of blocked linear Algebra computations
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Automatic data mapping for distributed-memory parallel computers
ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Computing selected eigenvalues of sparse unsymmetric matrices using subspace iteration
ACM Transactions on Mathematical Software (TOMS)
Toward parallel mathematical software for elliptic partial differential equations
ACM Transactions on Mathematical Software (TOMS)
A parallel block implementation of Level-3 BLAS for MIMD vector processors
ACM Transactions on Mathematical Software (TOMS)
Program optimization and parallelization using idioms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
IBM Journal of Research and Development
Algorithm 741: least-squares solution of a linear, bordered, block-diagonal system of equations
ACM Transactions on Mathematical Software (TOMS)
Computing the MDMT decomposition
ACM Transactions on Mathematical Software (TOMS)
Efficient vector and parallel manipulation of tensor products
ACM Transactions on Mathematical Software (TOMS)
Algorithm 753: TENPACK: a LAPACK-based library for the computer manipulation of tensor products
ACM Transactions on Mathematical Software (TOMS)
The design of a new frontal code for solving sparse, unsymmetric systems
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
Parallel reduction of banded matrices to bidiagonal form
Parallel Computing
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Algorithm 767: a Fortran 77 package for column reduction of polynomial matrices
ACM Transactions on Mathematical Software (TOMS)
Open implementation design guidelines
ICSE '97 Proceedings of the 19th international conference on Software engineering
ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Practical experience in the numerical dangers of heterogeneous computing
ACM Transactions on Mathematical Software (TOMS)
Compiler blockability of dense matrix factorizations
ACM Transactions on Mathematical Software (TOMS)
Efficient householder QR factorization for superscalar processors
ACM Transactions on Mathematical Software (TOMS)
Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface
ACM Transactions on Mathematical Software (TOMS)
The automatic generation of sparse primitives
ACM Transactions on Mathematical Software (TOMS)
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Portable and efficient factorization algorithms on the IBM 3090/VF
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Vectorizing a robust inner product algorithm
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Direct numerical simulation of turbulence with a PC/linux cluster: fact or fiction?
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive Processing
IEEE Transactions on Parallel and Distributed Systems
ACM Transactions on Mathematical Software (TOMS)
OoLALA: an object oriented analysis and design of numerical linear algebra
OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
PSBLAS: a library for parallel linear algebra computation on sparse matrices
ACM Transactions on Mathematical Software (TOMS)
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
NetSolve: a network server for solving computational science problems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Automatic translation of Fortran to JVM bytecode
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A recursive formulation of Cholesky factorization of a matrix in packed storage
ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimization of a parallel ocean general circulation model
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Distributed component architecture for scientific applications
CRPIT '02 Proceedings of the Fortieth International Conference on Tools Pacific: Objects for internet, mobile and embedded applications
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
Design, implementation and testing of extended and mixed precision BLAS
ACM Transactions on Mathematical Software (TOMS)
Algorithm 818: A reference model implementation of the sparse BLAS in fortran 95
ACM Transactions on Mathematical Software (TOMS)
Preface to the special issue on the basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
Component-based derivation of a parallel stiff ODE solver implemented in a cluster of computers
International Journal of Parallel Programming
Component-Based Derivation of a Parallel Stiff ODE Solver Implemented in a Cluster of Computers
International Journal of Parallel Programming
Linear Algebra Libraries for High-Performance Computers: A Personal Perspective
IEEE Parallel & Distributed Technology: Systems & Technology
The Decompositional Approach to Matrix Computation
Computing in Science and Engineering
Faster Numerical Algorithms Via Exception Handling
IEEE Transactions on Computers
Parallel multiplication of a vector by a kronecker product of matrices
Parallel numerical linear algebra
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Scalable Sparse Matrix Techniques for Modeling Crack Growth
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW
SAIG '00 Proceedings of the International Workshop on Semantics, Applications, and Implementation of Program Generation
An Evaluation of Java for Numerical Computing
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
A Performance Study on a Single Processing Node of the HITACHI SR8000
NAA '00 Revised Papers from the Second International Conference on Numerical Analysis and Its Applications
A new data-mapping scheme for latency-tolerant distributed sparse triangular solution
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Advanced environments for parallel and distributed applications: a view of current status
Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
Formal derivation of algorithms: The triangular sylvester equation
ACM Transactions on Mathematical Software (TOMS)
NetSolve: A Network-Enabled Solver: Examples and Users
HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Mathematical software: past, present, and future
Computational science, mathematics and software
Numerical algorithm delivery mechanisms
Computational science, mathematics and software
Sourcebook of parallel computing
Matrix bidiagonalization: implementation and evaluation on the Trident processor
Neural, Parallel & Scientific Computations
Self-adapting software for numerical linear algebra and LAPACK for clusters
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Vector reduction/transformation operators
ACM Transactions on Mathematical Software (TOMS)
Architecture of an automatically tuned linear algebra library
Parallel Computing
MA57---a code for the solution of sparse symmetric definite and indefinite systems
ACM Transactions on Mathematical Software (TOMS)
High-performance linear algebra algorithms using new generalized data structures for matrices
IBM Journal of Research and Development
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces
ACM Transactions on Mathematical Software (TOMS)
Parallel out-of-core computation and updating of the QR factorization
ACM Transactions on Mathematical Software (TOMS)
A fully portable high performance minimal storage hybrid format Cholesky algorithm
ACM Transactions on Mathematical Software (TOMS)
Performance Evaluation of Linear Algebra Routines
International Journal of High Performance Computing Applications
Accumulating Householder transformations, revisited
ACM Transactions on Mathematical Software (TOMS)
Improving the performance of reduction to Hessenberg form
ACM Transactions on Mathematical Software (TOMS)
Linear algebra operators for GPU implementation of numerical algorithms
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
An evaluation of Java for numerical computing
Scientific Programming
JLAPACK - compiling LAPACK Fortran to Java
Scientific Programming
Recursive approach in sparse matrix LU factorization
Scientific Programming
ACM Transactions on Mathematical Software (TOMS)
Neural, Parallel & Scientific Computations
Scalable parallelization of FLAME code via the workqueuing model
ACM Transactions on Mathematical Software (TOMS)
High performance BLAS formulation of the multipole-to-local operator in the fast multipole method
Journal of Computational Physics
Parallelization of a method for the solution of the inverse additive singular value problem
MATH'05 Proceedings of the 8th WSEAS International Conference on Applied Mathematics
ISTASC'04 Proceedings of the 4th WSEAS International Conference on Systems Theory and Scientific Computation
A highly efficient implementation of a backpropagation learning algorithm using matrix ISA
Journal of Parallel and Distributed Computing
Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
ACM Transactions on Mathematical Software (TOMS)
Benchmarking Domain-Specific Compiler Optimizations for Variational Forms
ACM Transactions on Mathematical Software (TOMS)
The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Pattern-Driven Automatic Parallelization
Scientific Programming
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
High Performance Implementation of Binomial Option Pricing
ICCSA '08 Proceeding sof the international conference on Computational Science and Its Applications, Part I
Multidimensional Blocking in UPC
Languages and Compilers for Parallel Computing
A high performance tool for the simulation of the dynamic pantograph-catenary interaction
Mathematics and Computers in Simulation
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
The Mailman algorithm: A note on matrix--vector multiplication
Information Processing Letters
Solving dense linear systems on platforms with multiple hardware accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelization of Sphere-Decoding Methods
High Performance Computing for Computational Science - VECPAR 2008
Programming matrix algorithms-by-blocks for thread-level parallelism
ACM Transactions on Mathematical Software (TOMS)
C++ Bindings to External Software Libraries with Examples from BLAS, LAPACK, UMFPACK, and MUMPS
ACM Transactions on Mathematical Software (TOMS)
A Parallel Numerical Library for UPC
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
ACM Transactions on Mathematical Software (TOMS)
Accelerating the complex Hessenberg QR algorithm with the CSX600 floating-point coprocessor
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
IBM Journal of Research and Development
Scaling LAPACK panel operations using parallel cache assignment
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion
ACM Transactions on Mathematical Software (TOMS)
Scaling and pivoting in an out-of-core sparse direct solver
ACM Transactions on Mathematical Software (TOMS)
Paper: Solving almost block diagonal systems on parallel computers
Parallel Computing
The impact of memory organization on the performance of matrix calculations
Parallel Computing
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Self-adapting software for numerical linear algebra library routines on clusters
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Toward memory-efficient linear solvers
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Minimal data copy for dense linear algebra factorization
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
A supernodal out-of-core sparse Gaussian-elimination method
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Performance evaluation of basic linear algebra subroutines on a matrix co-processor
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Using hybrid CPU-GPU platforms to accelerate the computation of the matrix sign function
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
On improving performance and energy profiles of sparse scientific applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The general matrix multiply-add operation on 2D torus
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Effective out-of-core parallel delaunay mesh refinement using off-the-shelf software
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Matrix Computation View of FastMap and RobustMap Dimension Reduction Algorithms
SIAM Journal on Matrix Analysis and Applications
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation
Science of Computer Programming
Adaptive Techniques for Improving the Performance of Incomplete Factorization Preconditioning
SIAM Journal on Scientific Computing
Partial factorization of a dense symmetric indefinite matrix
ACM Transactions on Mathematical Software (TOMS)
An introduction to GPU accelerated surgical simulation
ISBMS'06 Proceedings of the Third international conference on Biomedical Simulation
Parallel optimization methods based on direct search
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Parallelising matrix operations on clusters for an optimal control-based quantum compiler
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Numerical integration of the differential riccati equation: a high performance computing approach
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
A matrix-type for performance–portability
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Rapid development of high-performance linear algebra libraries
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Semi-automatic generation of grid computing interfaces for numerical software libraries
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Parallelization of general matrix multiply routines using OpenMP
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Data mining with parallel support vector machines for classification
ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Two-stage least squares and indirect least squares algorithms for simultaneous equations models
Journal of Computational and Applied Mathematics
High performance BLAS formulation of the adaptive Fast Multipole Method
Mathematical and Computer Modelling: An International Journal
Journal of Parallel and Distributed Computing
Concurrency and Computation: Practice & Experience
Generalizing matrix multiplication for efficient computations on modern computers
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Modeling performance through memory-stalls
ACM SIGMETRICS Performance Evaluation Review
Families of Algorithms for Reducing a Matrix to Condensed Form
ACM Transactions on Mathematical Software (TOMS)
Journal of Computing Sciences in Colleges
UPCBLAS: a library for parallel matrix computations in Unified Parallel C
Concurrency and Computation: Practice & Experience
Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scaling LAPACK panel operations using parallel cache assignment
ACM Transactions on Mathematical Software (TOMS)
A case study in mechanically deriving dense linear algebra code
International Journal of High Performance Computing Applications
A Basic Linear Algebra Compiler
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.01 |
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable implementations of algorithms for high-performance computers.