Quantifying fault recovery in multiprocessor systems

Authors:
Frank Harary;Miroslaw Malek
Affiliations:
New Mexico State University Las Cruces, NM 88003, U.S.A.;Department of Electrical and Computer Engineering The University of Texas at Austin, Austin, TX 78712-1084, U.S.A.
Venue:
Mathematical and Computer Modelling: An International Journal
Year:
1993

Citing 6
Cited 1

Distributed Recovery in Fault-Tolerant Multiprocessor Networks

IEEE Transactions on Computers
Parallel computing: theory and comparisons

Parallel computing: theory and comparisons
Subcube fault-tolerance in hypercubes

Information and Computation
A Graph Model for Fault-Tolerant Computing Systems

IEEE Transactions on Computers
An Analysis Model for Digital System Diagnosis

IEEE Transactions on Computers
System-Level Fault Diagnosis

Computer

Dynamic graph models

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.98

Visualization

Abstract

We formalize and quantify various aspects of reliable computing with emphasis on efficient fault recovery. The mathematical model which proves to be most appropriate is provided by the theory of graphs. We have developed new measures for fault recovery and observe that the value of elements of the fault recovery vector depend not only on the computation graph H and the architecture graph G, but also on the specific location of a fault. In our examples, we choose a hypercube as a representative of parallel computer architecture, and a pipeline as a typical configuration for program execution. We define dependability qualities of such a system with or witout a fault. These qualities are determined by the resiliency triple defined by three parameters: multiplicity, robustness, and configurability. We also introduce parameters for measuring the recovery effectiveness in terms of distance, time, and the number of new, used, and moved nodes and edges.