Distributed Recovery in Fault-Tolerant Multiprocessor Networks
IEEE Transactions on Computers
Parallel computing: theory and comparisons
Parallel computing: theory and comparisons
Subcube fault-tolerance in hypercubes
Information and Computation
A Graph Model for Fault-Tolerant Computing Systems
IEEE Transactions on Computers
An Analysis Model for Digital System Diagnosis
IEEE Transactions on Computers
Computer
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.98 |
We formalize and quantify various aspects of reliable computing with emphasis on efficient fault recovery. The mathematical model which proves to be most appropriate is provided by the theory of graphs. We have developed new measures for fault recovery and observe that the value of elements of the fault recovery vector depend not only on the computation graph H and the architecture graph G, but also on the specific location of a fault. In our examples, we choose a hypercube as a representative of parallel computer architecture, and a pipeline as a typical configuration for program execution. We define dependability qualities of such a system with or witout a fault. These qualities are determined by the resiliency triple defined by three parameters: multiplicity, robustness, and configurability. We also introduce parameters for measuring the recovery effectiveness in terms of distance, time, and the number of new, used, and moved nodes and edges.