A survey of methods for system-level fault diagnosis

  • Authors:
  • J. Xu;L. Lilien

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, University of Illinois at Chicago, Chicago, Illinois;Department of Electrical Engineering and Computer Science, University of Illinois at Chicago, Chicago, Illinois

  • Venue:
  • ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
  • Year:
  • 1987

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the increasing need for efficient means of automatic fault diagnosis in large distributed computing systems, system-level fault diagnosis has been a fertile research area for the last few years. There are two types of system-level fault diagnosis methods: classical and adaptive. The classical methods select a set of tests, find results of all these tests, and then process the results to identify the faulty units. The adaptive methods first identify just one fault-free unit and then use it to identify all faulty units. Each of these types of diagnostic methods can assume so called symmetric or asymmetric test invalidation. The former states that tests performed by good units always give correct results, while tests performed by faulty units can produce any results. The latter states that a faulty unit always fails a test, even if the units that influence the test result are faulty. We survey a number of diagnosis methods for each of the two types under both invalidation assumptions. Each of the methods is considered in the context of a certain diagnostic model (such as, e.g., the Boolean n-cube model where processors are represented by nodes and links are represented by edges of a graph). Finally, a comparison of the two types of methods shows that the classical methods are faster (require fewer steps for diagnosis) but less efficient (may misdiagnose more fault-free units as faulty) than the adaptive methods.