Self-stabilizing iterative solvers

Authors:
Piyush Sao;Richard Vuduc
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology
Venue:
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Year:
2013

Citing 21
Cited 0

GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
An analysis of algorithm-based fault tolerance techniques

Journal of Parallel and Distributed Computing
A Linear Algebraic Model of Algorithm-Based Fault Tolerance

IEEE Transactions on Computers
Algorithm-based fault tolerance for matrix inversion with maximum pivoting

Journal of Parallel and Distributed Computing
Self-stabilization

Self-stabilization
Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration

SIAM Journal on Scientific Computing
Self-stabilizing systems in spite of distributed control

Communications of the ACM
Software-Based Replication for Fault Tolerance

Computer
Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing

SIAM Journal on Scientific Computing
Inexact Krylov Subspace Methods for Linear Systems

SIAM Journal on Matrix Analysis and Applications
Algorithm-Based Fault Tolerance for Matrix Operations

IEEE Transactions on Computers
Parallel Iterative Algorithms: From Sequential to Grid Computing (Chapman & Hall/Crc Numerical Analy & Scient Comp. Series)

Parallel Iterative Algorithms: From Sequential to Grid Computing (Chapman & Hall/Crc Numerical Analy & Scient Comp. Series)
Soft error vulnerability of iterative linear algebra methods

Proceedings of the 22nd annual international conference on Supercomputing
DRAM errors in the wild: a large-scale field study

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Low Power Probabilistic Floating Point Multiplier Design

ISVLSI '11 Proceedings of the 2011 IEEE Computer Society Annual Symposium on VLSI
Numerical Defect Correction as an Algorithm-Based Fault Tolerance Technique for Iterative Solvers

PRDC '11 Proceedings of the 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing
Algorithm-based fault tolerance for dense matrix factorizations

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Fault tolerant preconditioned conjugate gradient for sparse linear system solution

Proceedings of the 26th ACM international conference on Supercomputing
Improving the Performance of Dynamical Simulations Via Multiple Right-Hand Sides

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Online-ABFT: an online algorithm based fault tolerance scheme for soft error detection in iterative methods

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A block-asynchronous relaxation method for graphics processing units

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how to use the idea of self-stabilization, which originates in the context of distributed control, to make fault-tolerant iterative solvers. Generally, a self-stabilizing system is one that, starting from an arbitrary state (valid or invalid), reaches a valid state within a finite number of steps. This property imbues the system with a natural means of tolerating transient faults. We give two proof-of-concept examples of self-stabilizing iterative linear solvers: one for steepest descent (SD) and one for conjugate gradients (CG). Our self-stabilized versions of SD and CG require small amounts of fault-detection, e.g., we may check only for NaNs and infinities. We test our approach experimentally by analyzing its convergence and overhead for different types and rates of faults. Beyond the specific findings of this paper, we believe self-stabilization has promise to become a useful tool for constructing resilient solvers more generally.