GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Mathematica: a system for doing mathematics by computer
Mathematica: a system for doing mathematics by computer
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Efficient high accuracy solutions with GMRES(m)
SIAM Journal on Scientific and Statistical Computing
A Fortran 90-based multiprecision system
ACM Transactions on Mathematical Software (TOMS)
Maple V: programming guide
Applied numerical linear algebra
Applied numerical linear algebra
ScaLAPACK user's guide
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
A new O (N(2)) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem
A new O (N(2)) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
A Fortran Multiple-Precision Arithmetic Package
ACM Transactions on Mathematical Software (TOMS)
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Making sparse Gaussian elimination scalable by static pivoting
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Faster Numerical Algorithms Via Exception Handling
IEEE Transactions on Computers
Accurate eigenvalues of a symmetric tri-diagonal matrix
Accurate eigenvalues of a symmetric tri-diagonal matrix
Analysis and comparison of two general sparse solvers for distributed memory computers
ACM Transactions on Mathematical Software (TOMS)
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
The Journal of Supercomputing
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
ACM Transactions on Mathematical Software (TOMS)
More accuracy at fixed precision
Journal of Computational and Applied Mathematics - Special issue: Proceedings of the international conference on linear algebra and arithmetic, Rabat, Morocco, 28-31 May 2001
An overview of SuperLU: Algorithms, implementation, and user interface
ACM Transactions on Mathematical Software (TOMS) - Special issue on the Advanced CompuTational Software (ACTS) Collection
Provably faithful evaluation of polynomials
Proceedings of the 2006 ACM symposium on Applied computing
Error bounds from extra-precise iterative refinement
ACM Transactions on Mathematical Software (TOMS)
Generic programming and high-performance libraries
International Journal of Parallel Programming - Special issue: The next generation software program
Gaussian elimination: a case study in efficient genericity with MetaOCaml
Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Super-fast validated solution of linear systems
Journal of Computational and Applied Mathematics - Special issue: Scientific computing, computer arithmetic, and validated numerics (SCAN 2004)
Convergence of Rump's method for inverting arbitrarily ill-conditioned matrices
Journal of Computational and Applied Mathematics
The schur aggregation for solving linear systems of equations
Proceedings of the 2007 international workshop on Symbolic-numeric computation
Additive preconditioning and aggregation in matrix computations
Computers & Mathematics with Applications
A parallel algorithm for accurate dot product
Parallel Computing
International Journal of Parallel, Emergent and Distributed Systems
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
Schur aggregation for linear systems and determinants
Theoretical Computer Science
Extra-Precise Iterative Refinement for Overdetermined Least Squares Problems
ACM Transactions on Mathematical Software (TOMS)
A new error-free floating-point summation algorithm
Computers & Mathematics with Applications
Error-Free Transformation in Rounding Mode toward Zero
Numerical Validation in Current Hardware Architectures
Optimal and Near-Optimal Energy-Efficient Broadcasting in Wireless Networks
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Prospectus for the next LAPACK and ScaLAPACK libraries
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Proceedings of the 24th ACM International Conference on Supercomputing
International Journal of Reconfigurable Computing - Special issue on selected papers from ReconFig 2009 International conference on reconfigurable computing and FPGAs (ReconFig 2009)
Accurate Matrix Factorization: Inverse LU and Inverse QR Factorizations
SIAM Journal on Matrix Analysis and Applications
Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic
Information and Computation
Verified Bounds for Least Squares Problems and Underdetermined Linear Systems
SIAM Journal on Matrix Analysis and Applications
Accurate solution of dense linear systems, part I: Algorithms in rounding to nearest
Journal of Computational and Applied Mathematics
Accurate evaluation of the k-th derivative of a polynomial and its application
Journal of Computational and Applied Mathematics
Proceedings of the 27th international ACM conference on International conference on supercomputing
Automatically adapting programs for mixed-precision floating-point computation
Proceedings of the 27th international ACM conference on International conference on supercomputing
The Journal of Supercomputing
Precimonious: tuning assistant for floating-point precision
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more accurate, and sometimes faster than possible without these features. The new BLAS are challenging to implement and test because there are many more subroutines than in the existing Standard, and because we must be able to assess whether a higher precision is used for internal computations than is used for either input or output variables. We have therefore developed an automated process of generating and systematically testing these routines. Our methodology is applicable to languages besides C. In particular, our algorithms used in the testing code will be valuable to all other BLAS implementors. Our extra precision routines achieve excellent performance---close to half of the machine peak Megaflop rate even for the Level 2 BLAS, when the data access is stride one.