An Isochronous Testing Strategy for Hierarchical Adaptive Distributed System-Level Diagnosis

Authors:
Alessandro Brawerman;Elias Procópio Duarte, Jr.
Affiliations:
Federal University of Paraná, Department of Informatics, Cx. Postal 19081, Curitiba, 81531-990 PR, Brazil. brawer@uol.com.br;Federal University of Paraná, Department of Informatics, Cx. Postal 19081, Curitiba, 81531-990 PR, Brazil. elias@inf.ufpr.br
Venue:
Journal of Electronic Testing: Theory and Applications
Year:
2001

Citing 5
Cited 0

Simulating computer systems: techniques and tools

Simulating computer systems: techniques and tools
Implementation of Online Distributed System-Level Diagnosis Theory

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Fault tolerance in distributed systems

Fault tolerance in distributed systems
System diagnosis

Fault-tolerant computer system design
A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed System-level diagnosis allows the fault-free components of a fault-tolerant distributed system to determine which components of the system are faulty and which are fault-free. The time it takes for nodes running the algorithm to diagnose a new event is called the algorithm's latency. In this paper we present a new distributed system-level diagnosis algorithm which presents a latency of O(log N) testing rounds, for a system of N nodes. A previous hierarchical distributed system-level diagnosis algorithm, Hi-ADSD, presents a latency of O(log2 N) testing rounds. Nodes are grouped in progressively larger logical clusters for the purpose of testing. The algorithm employs an isochronous testing strategy that forces all fault-free nodes to execute tests on clusters of the same size each testing round. This strategy is based on two main principles: a tested node must test its tester in the same round; a node only accepts tests according to a lexical priority order. We present formal proofs that the algorithm's latency is at most 2log N – 1 testing rounds and that the testing strategy of the algorithm leads to the execution of isochronous tests. Simulation results are shown for systems of up to 64 nodes.