LAPACK's user's guide

Authors:
E. Anderson;Z. Bai;C. Bischof;J. Demmel;J. Dongarra;J. Du Croz;A. Greenbaum;S. Hammarling;A. McKenney;S. Ostrouchov;D. Sorensen
Affiliations:
Cray Research, Inc.;Univ. of Kentucky, Lexington;Argonne National Lab.;Univ. of California, Berkeley;Univ. of Tennessee and Oak Ridge National Lab.;Numerical Algorithms Group Ltd.;Courant Institute of Mathematical Sciences, New York Univ., New York, NY;Numerical Algorithms Group Ltd.;Courant Institute of Mathematical Sciences, New York Univ., New York, NY;Univ. of Tennessee;Rice Univ., Houston, TX
Venue:
LAPACK's user's guide
Year:
1992

Citing 0
Cited 201

FORTRAN subroutines for general Toeplitz systems

ACM Transactions on Mathematical Software (TOMS)
QR-like algorithms for the nonsymmetric eigenvalue problem

ACM Transactions on Mathematical Software (TOMS)
RISC microprocessors and scientific computing

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A parallel block implementation of Level-3 BLAS for MIMD vector processors

ACM Transactions on Mathematical Software (TOMS)
Algorithms for intersecting parametric and algebraic curves I: simple intersections

ACM Transactions on Graphics (TOG)
MOB forms: a class of multilevel block algorithms for dense linear algebra operations

ICS '94 Proceedings of the 8th international conference on Supercomputing
Computing selected solutions of polynomial equations

ISSAC '94 Proceedings of the international symposium on Symbolic and algebraic computation
Monomial bases and polynomial system solving (extended abstract)

ISSAC '94 Proceedings of the international symposium on Symbolic and algebraic computation
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms

IBM Journal of Research and Development
Algorithm 741: least-squares solution of a linear, bordered, block-diagonal system of equations

ACM Transactions on Mathematical Software (TOMS)
Fast floating-point processing in Common Lisp

ACM Transactions on Mathematical Software (TOMS)
Public international benchmarks for parallel computers: PARKBENCH committee: Report-1

Scientific Programming
Computing the MDMT decomposition

ACM Transactions on Mathematical Software (TOMS)
Numeric-symbolic algorithms for evaluating one-dimensional algebraic sets

ISSAC '95 Proceedings of the 1995 international symposium on Symbolic and algebraic computation
Efficient vector and parallel manipulation of tensor products

ACM Transactions on Mathematical Software (TOMS)
Algorithm 753: TENPACK: a LAPACK-based library for the computer manipulation of tensor products

ACM Transactions on Mathematical Software (TOMS)
LAPACK-style algorithms and software for solving the generalized Sylvester equation and estimating the separation between regular matrix pairs

ACM Transactions on Mathematical Software (TOMS)
Handling floating-point exceptions in numeric programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
The design of MA48: a code for the direct solution of sparse unsymmetric linear systems of equations

ACM Transactions on Mathematical Software (TOMS)
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Analysis of zero clusters in multivariate polynomial systems

ISSAC '96 Proceedings of the 1996 international symposium on Symbolic and algebraic computation
Object-oriented design of preconditioned iterative methods in diffpack

ACM Transactions on Mathematical Software (TOMS)
Algorithm 767: a Fortran 77 package for column reduction of polynomial matrices

ACM Transactions on Mathematical Software (TOMS)
On the parallel complexity of matrix factorization algorithms

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
A reordered Schur factorization method for zero-dimensional polynomial systems with multiple roots

ISSAC '97 Proceedings of the 1997 international symposium on Symbolic and algebraic computation
Algorithms and design for a second-order automatic differentiation module

ISSAC '97 Proceedings of the 1997 international symposium on Symbolic and algebraic computation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Practical experience in the numerical dangers of heterogeneous computing

ACM Transactions on Mathematical Software (TOMS)
Preservation of passivity during RLC network reduction via split congruence transformations

DAC '97 Proceedings of the 34th annual Design Automation Conference
Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface

ACM Transactions on Mathematical Software (TOMS)
On a High Order Numerical Method for Solving Partial Differential Equations in Complex Geometries

Journal of Scientific Computing
Algorithms for block bidiagonal systems on vector and parallel computers

ICS '98 Proceedings of the 12th international conference on Supercomputing
Algorithm 776: SRRIT: a Fortran subroutine to calculate the dominant invariant subspace of a nonsymmetric matrix

ACM Transactions on Mathematical Software (TOMS)
Algorithm 777: HOMPACK90: a suite of Fortran 90 codes for globally convergent homotopy algorithms

ACM Transactions on Mathematical Software (TOMS)
The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers

ACM Transactions on Mathematical Software (TOMS)
Computing rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
Algorithm 782: codes for rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark

ACM Transactions on Mathematical Software (TOMS)
Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues

ACM Transactions on Mathematical Software (TOMS)
Algorithm 788: automatic boundary integral equation programs for the planar Laplace equation

ACM Transactions on Mathematical Software (TOMS)
Robust Algorithms for Object Localization

International Journal of Computer Vision
On Improvements to the Analytic Center Cutting Plane Method

Computational Optimization and Applications
Self-adapting Fortran 77 machine constants: comment on Algorithm 528

ACM Transactions on Mathematical Software (TOMS)
C++ classes for linking optimization with complex simulations

ACM Transactions on Mathematical Software (TOMS)
The RISC BLAS: a blocked implementation of level 3 BLAS for RISC processors

ACM Transactions on Mathematical Software (TOMS)
Memory characteristics of iterative methods

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Direct numerical simulation of turbulence with a PC/linux cluster: fact or fiction?

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
An annotation language for optimizing software libraries

Proceedings of the 2nd conference on Domain-specific languages
Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form

ACM Transactions on Mathematical Software (TOMS)
C++ implementations of numerical methods for solving differential-algebraic equations: design and optimization considerations

ACM Transactions on Mathematical Software (TOMS)
Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive Processing

IEEE Transactions on Parallel and Distributed Systems
Exact computations of the inertia symmetric integer matrices

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Design and evaluation of a linear algebra package for Java

Proceedings of the ACM 2000 conference on Java Grande
Algorithm 800: Fortran 77 subroutines for computing the eigenvalues of Hamiltonian matrices. I: the square-reduced method

ACM Transactions on Mathematical Software (TOMS)
Algorithm 801: POLSYS_PLP: a partitioned linear product homotopy code for solving polynomial systems of equations

ACM Transactions on Mathematical Software (TOMS)
Band reduction algorithms revisited

ACM Transactions on Mathematical Software (TOMS)
ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Implementation of Strassen's algorithm for matrix multiplication

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
NetSolve: a network server for solving computational science problems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Tiling imperfectly-nested loop nests

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Basic research for coloring multichannel MRI data

Proceedings of the conference on Visualization '00
Fractal symbolic analysis

ICS '01 Proceedings of the 15th international conference on Supercomputing
Polynomial root finding using iterated Eigenvalue computation

Proceedings of the 2001 international symposium on Symbolic and algebraic computation
FLAME: Formal Linear Algebra Methods Environment

ACM Transactions on Mathematical Software (TOMS)
Tuning Strassen's matrix multiplication for memory efficiency

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High performance first principles method for complex magnetic properties

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A practical approach to sample-path simulation optimization

Proceedings of the 32nd conference on Winter simulation
Spectral Mixture Analysis: Linear and Semi-parametric Full and Iterated Partial Unmixing in Multi- and Hyperspectral Image Data

Journal of Mathematical Imaging and Vision
Spectral Mixture Analysis: Linear and Semi-parametric Full and Iterated Partial Unmixing in Multi- and Hyperspectral Image Data

International Journal of Computer Vision - Joint special issue on image analysis
Efficient Algorithms for the Block Hessenberg Form

The Journal of Supercomputing
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum

ACM Transactions on Mathematical Software (TOMS)
Distribution Assignment Placement: Effective Optimization of Redistribution Costs

IEEE Transactions on Parallel and Distributed Systems
Component-based derivation of a parallel stiff ODE solver implemented in a cluster of computers

International Journal of Parallel Programming
Automatic versus manual model differentiation to compute sensitivities and solve non-linear inverse problems

Computers & Geosciences
Parallel algorithms for LQ optimal control of discrete-time periodic linear systems

Journal of Parallel and Distributed Computing
Component-Based Derivation of a Parallel Stiff ODE Solver Implemented in a Cluster of Computers

International Journal of Parallel Programming
An Attempt for Coloring Multichannel MR Imaging Data

IEEE Transactions on Visualization and Computer Graphics
Faster Numerical Algorithms Via Exception Handling

IEEE Transactions on Computers
A language approach to high performance computing on heterogeneous networks

Progress in computer research
A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems

Parallel numerical linear algebra
Fault-Detection by Result-Checking for the Eigenproblem

EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
Data-Flow Oriented Visual Programming Libraries for Scientific Computing

ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Parallel Out-of-Core Cholesky and QR Factorization with POOCLAPACK

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Time-Integration Algorithms for the Computer Treatment of the Horizontal Advection in Air Pollution Models

LSSC '01 Proceedings of the Third International Conference on Large-Scale Scientific Computing-Revised Papers
Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Parallel Implementation of a Potential Reduction Algorithm for Box-Constrained Quadratic Programming

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Automatic Generation of Block-Recursive Codes

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A New Parallel Approach to the Toeplitz Inverse Eigenproblem Using Newton-like Methods

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
A Parallel Algorithm for Solving the Toeplitz Least Squares Problem

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
HPF and Numerical Libraries

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Inversion of Symmetric Matrices in a New Block Packes Storage

NAA '00 Revised Papers from the Second International Conference on Numerical Analysis and Its Applications
An Adaptive Working Set Algorithm

Messung, Modellierung und Bewertung von Rechensystemen, 2. GI/NTG-Fachtagung
A Framework for Generic State Estimation in Computer Vision Applications

ICVS '01 Proceedings of the Second International Workshop on Computer Vision Systems
An Efficient Parallel Algorithm for the Symmetric Tridiagonal Eigenvalue Problem

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
ICENI: optimisation of component applications within a Grid environment

Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
Massive data set issues in air pollution modelling

Handbook of massive data sets
Formal derivation of algorithms: The triangular sylvester equation

ACM Transactions on Mathematical Software (TOMS)
Nonlinear optimization and parallel computing

Parallel Computing - Special issue: Parallel computing in numerical optimization
Steady-State Analysis of Infinite Stochastic Petri Nets: Comparing the Spectral Expansion and the Matrix-Geometric Method

PNPM '97 Proceedings of the 6th International Workshop on Petri Nets and Performance Models
Algorithm engineering for parallel computation

Experimental algorithmics
A new decoupling technique for the hermite cubic collocation equations arising from boundary value problems

Computational science, mathematics and software
References

Sourcebook of parallel computing
Hybrid (OpenMP and MPI) parallelization of MFIX: a multiphase CFD code for modeling fluidized beds

Proceedings of the 2003 ACM symposium on Applied computing
Computing a matrix function for exponential integrators

Journal of Computational and Applied Mathematics
Vector reduction/transformation operators

ACM Transactions on Mathematical Software (TOMS)
Efficient algorithms for block downdating of least squares solutions

Applied Numerical Mathematics - Numerical algorithms, parallelism and applications
Java programming for high-performance numerical computing

IBM Systems Journal
Global optimization technique for fixed-order control design

International Journal of Systems Science
An Efficient Parallel Algorithm to Solve Block-Toeplitz Systems

The Journal of Supercomputing
Representing linear algebra algorithms in code: the FLAME application program interfaces

ACM Transactions on Mathematical Software (TOMS)
Parallel out-of-core computation and updating of the QR factorization

ACM Transactions on Mathematical Software (TOMS)
Least-squares approximation

Encyclopedia of Computer Science
Numerical Libraries and Tools for Scalable Parallel Cluster Computing

International Journal of High Performance Computing Applications
Using Python for large scale linear algebra applications

Future Generation Computer Systems - Special section: Complex problem-solving environments for grid computing
Acceleration of the generalized global basis (GGB) method for nonlinear problems

Journal of Computational Physics
Algorithm 857: POLSYS_GLP—a parallel general linear product homotopy code for solving polynomial systems of equations

ACM Transactions on Mathematical Software (TOMS)
A memory model for scientific algorithms on graphics processors

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A constrained optimization algorithm for total energy minimization in electronic structure calculations

Journal of Computational Physics
Criteria for mixed grids in computational fluid dynamics

Mathematics and Computers in Simulation - Special issue: Applied and computational mathematics - selected papers of the fifth PanAmerican workshop - June 21-25, 2004, Tegucigalpa, Honduras
On PDE solution in transient optimization of gas networks

Journal of Computational and Applied Mathematics
Hybrid image classification and parameter selection using a shared memory parallel algorithm

Computers & Geosciences
Libckpt: transparent checkpointing under Unix

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Algorithm 867: QUADLOG—a package of routines for generating Gauss-related quadrature for two classes of logarithmic weight functions

ACM Transactions on Mathematical Software (TOMS)
Numerical techniques for computing the inertia of products of matrices of rational numbers

Proceedings of the 2007 international workshop on Symbolic-numeric computation
Application of the HLSVD technique to the filtering of X-ray diffraction data

EURASIP Journal on Applied Signal Processing
Cache-efficient numerical algorithms using graphics hardware

Parallel Computing
Scalable parallelization of FLAME code via the workqueuing model

ACM Transactions on Mathematical Software (TOMS)
A shared memory parallel algorithm for data reduction using the singular value decomposition

Proceedings of the 2008 Spring simulation multiconference
FastScat™: An Object-Oriented Program for Fast Scattering Computation

Scientific Programming - The First Annual Object-Oriented Numerics Conference (OON-SKI '93)
Misleading Performance Reporting in the Supercomputing Field

Scientific Programming
Algorithm Development for Distributed Memory Multicomputers Using CONLAB

Scientific Programming
A software framework for abstract expression of coordinate-free linear algebra and optimization algorithms

ACM Transactions on Mathematical Software (TOMS)
An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

High Performance Computing for Computational Science - VECPAR 2008
QR factorization for the Cell Broadband Engine

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

Parallel Computing
Interfaces for parallel numerical linear algebra libraries in high level languages

Advances in Engineering Software
A new algorithm for computing certified numerical approximations of the roots of a zero-dimensional system

Proceedings of the 2009 international symposium on Symbolic and algebraic computation
Communication-optimal parallel and sequential Cholesky decomposition: extended abstract

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Out-of-Core Computation of the QR Factorization on Multi-core Processors

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Study of neural net training methods in parallel and distributed architectures

Future Generation Computer Systems
Accelerating the complex Hessenberg QR algorithm with the CSX600 floating-point coprocessor

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Designing numerical libraries in Fortran 90

Computer Standards & Interfaces
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Fast and reliable passivity assessment and enforcement with extended Hamiltonian pencil

Proceedings of the 2009 International Conference on Computer-Aided Design
Programming in a high level approach for scientific computing

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Numerical method for regional pole assignment of linear control systems

NMA'06 Proceedings of the 6th international conference on Numerical methods and applications
A parallel Newton-GMRES algorithm for solving large scale nonlinear systems

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Cholesky factorization of band matrices using multithreaded BLAS

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Parallel implementation of a neural net training application in a heterogeneous grid environment

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Stability of the Richardson Extrapolation applied together with the θ-method

Journal of Computational and Applied Mathematics
SLAMM - Automating Memory Analysis for Numerical Algorithms

Electronic Notes in Theoretical Computer Science (ENTCS)
Efficient implementation of stable Richardson Extrapolation algorithms

Computers & Mathematics with Applications
Extended Hamiltonian pencil for passivity assessment and enforcement for S-parameter systems

Proceedings of the Conference on Design, Automation and Test in Europe
A fast solver for linear systems with displacement structure

Numerical Algorithms
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Using hybrid CPU-GPU platforms to accelerate the computation of the matrix sign function

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Scheduling parallel eigenvalue computations in a quantum chemistry code

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Using an iterative linear solver in an interior-point method for generating support vector machines

Computational Optimization and Applications
Technical Section: Multiresolutions numerically from subdivisions

Computers and Graphics
Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU-GPU platforms

Parallel Computing
A note on shifted Hessenberg systems and frequency response computation

ACM Transactions on Mathematical Software (TOMS)
Optimizing symmetric dense matrix-vector multiplication on GPUs

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A parallel block LU decomposition method for distributed finite element matrices

Parallel Computing
The Combinatorial BLAS: design, implementation, and applications

International Journal of High Performance Computing Applications
Goal-Oriented and Modular Stability Analysis

SIAM Journal on Matrix Analysis and Applications
Conditioning and error estimation in the numerical solution of matrix riccati equations

NAA'04 Proceedings of the Third international conference on Numerical Analysis and its Applications
HeteroMPI+ScaLAPACK: towards a ScaLAPACK (dense linear solvers) on heterogeneous networks of computers

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Parallel LU factorization of band matrices on SMP systems

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Performance analysis of overheads for matrix – vector multiplication in cluster environment

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Progressive surface reconstruction for heart mapping procedure

Computer-Aided Design
Rapid development of high-performance linear algebra libraries

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Rapid development of high-performance out-of-core solvers

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
On some stability properties of the richardson extrapolation applied together with the θ-method

LSSC'09 Proceedings of the 7th international conference on Large-Scale Scientific Computing
Performance modeling and optimal block size selection for the small-bulge multishift QR algorithm

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures

Concurrency and Computation: Practice & Experience
A scalable framework for heterogeneous GPU-based clusters

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
An overview on the eigenvalue computation for matrices

Neural, Parallel & Scientific Computations
Solving systems of interval linear equations in parallel using multithreaded model and "interval extended zero" method

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
The MADlib analytics library: or MAD skills, the SQL

Proceedings of the VLDB Endowment
Parallel, 'large', dense matrix problems: Application to 3D sequential integrated inversion of seismological and gravity data

Computers & Geosciences
Toward scalable matrix multiply on multithreaded architectures

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Accelerating the singular value decomposition of rectangular matrices with the CSK600 and the integrable SVD

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Graph expansion and communication costs of fast matrix multiplication

Journal of the ACM (JACM)
Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
A new implicit fictitious domain method for the simulation of flow in complex geometries with heat transfer

Journal of Computational Physics
Efficient generalized Hessenberg form and applications

ACM Transactions on Mathematical Software (TOMS)
An approach of the QR factorization for tall-and-skinny matrices on multicore platforms

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Exploiting domain knowledge to optimize parallel computational mechanics codes

Proceedings of the 27th international ACM conference on International conference on supercomputing
A divide-and-conquer approach for solving singular value decomposition on a heterogeneous system

Proceedings of the ACM International Conference on Computing Frontiers
Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
An improved parallel singular value algorithm and its implementation for multicore hardware

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication costs of Strassen's matrix multiplication

Communications of the ACM
Toward GPU accelerated topology optimization on unstructured meshes

Structural and Multidisciplinary Optimization
Solving large-scale optimization problems related to Bell's Theorem

Journal of Computational and Applied Mathematics
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

International Journal of Parallel Programming

Quantified Score

Hi-index	0.04

LAPACK's user's guide

Quantified Score

Visualization

Abstract