An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation

  • Authors:
  • C. S. Chao;D. L. Yang;A. C. Liu

  • Affiliations:
  • 100 Wenhua Road, Seatwen, Taichung, Taiwan 407, Republic of China. cschao@netlab.fcu.edu.tw;Department of Information Engineering, Feng Chia University, Taichung, Taiwan 407, Republic of China. dlyang.liu@fcu.edu.tw;Department of Information Engineering, Feng Chia University, Taichung, Taiwan 407, Republic of China. dlyang.liu@fcu.edu.tw

  • Venue:
  • Journal of Network and Systems Management
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing importance of computer networks in this information age demands a high level of network availability and reliability. As we become more dependent on networks in our so-called cyber-world, network faults and downtime become very costly. Sometimes, a slight fault may cause critical disruptions or remediless damages to the network while the network manager is lost among a large amount of alarm messages. Therefore, the development of a practical and effective system for network fault diagnosis becomes an imperative and critical task. In this paper, we develop a hierarchical domain-oriented reasoning mechanism suitable for the delegated management architecture. It is based on the causality graph of a refined network fault propagation model as a result of our empirical study. An automated fault diagnosis system called Alarm Correlation View (or ACView) for isolating network faults in a multi-domain environment is proposed according to the hierarchical reasoning mechanism. This diagnosis system not only provides the process of automated alarm collection and correlation, but also serves the function of efficient fault localization and identification. Furthermore, an alarm-to-fault mapping strategy is used to enhance the fault reasoning capability for uncertain network fault propagation.