Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

Authors:
Alfredo Buttari;Jack Dongarra;Jakub Kurzak;Piotr Luszczek;Stanimir Tomov
Affiliations:
ENS Lyon;University of Tennessee Knoxville and Oak Ridge National Laboratory and University of Manchester;University of Tennessee Knoxville;The MathWorks;University of Tennessee Knoxville
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2008

Citing 28
Cited 11

GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
The algebraic eigenvalue problem

The algebraic eigenvalue problem
A black box generalized conjugate gradient solver with inner iterations and variable-step preconditioning

SIAM Journal on Matrix Analysis and Applications
Efficient high accuracy solutions with GMRES(m)

SIAM Journal on Scientific and Statistical Computing
New insights in GMRES-like methods with variable preconditioners

Journal of Computational and Applied Mathematics
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
An Unsymmetric-Pattern Multifrontal Method for Sparse LU Factorization

SIAM Journal on Matrix Analysis and Applications
Applied numerical linear algebra

Applied numerical linear algebra
A combined unifrontal/multifrontal method for unsymmetric sparse matrices

ACM Transactions on Mathematical Software (TOMS)
A Supernodal Approach to Sparse Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
Iterative Refinement in Floating Point

Journal of the ACM (JACM)
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination

SIAM Journal on Matrix Analysis and Applications
Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration

SIAM Journal on Scientific Computing
The Multifrontal Solution of Indefinite Sparse Symmetric Linear

ACM Transactions on Mathematical Software (TOMS)
High-performacne parallel implicit CFD

Parallel Computing - Special issue on parallel computing in aerospace
Matrix algorithms

Matrix algorithms
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Flexible Conjugate Gradients

SIAM Journal on Scientific Computing
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

SIAM Journal on Matrix Analysis and Applications
Flexible Inner-Outer Krylov Subspace Methods

SIAM Journal on Numerical Analysis
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

ACM Transactions on Mathematical Software (TOMS)
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method

ACM Transactions on Mathematical Software (TOMS)
The effect of non-optimal bases on the convergence of Krylov subspace methods

Numerische Mathematik
Hybrid scheduling for the parallel solution of linear systems

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems)

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems

International Journal of High Performance Computing Applications
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems

ACM Transactions on Mathematical Software (TOMS)
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Parallel Computing
Towards dense linear algebra for hybrid GPU accelerated manycore systems

Parallel Computing
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Precimonious: tuning assistant for floating-point precision

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Towards fully automatic auto-tuning: Leveraging language features of Chapel

International Journal of High Performance Computing Applications
A hybridizable discontinuous Galerkin method combined to a Schwarz algorithm for the solution of 3d time-harmonic Maxwell's equation

Journal of Computational Physics
Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems

Scientific Programming
Tool support for software lookup table optimization

Scientific Programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the Cell BE processor.