IEEE Transactions on Software Engineering
Schemes for fault identification in communication networks
IEEE/ACM Transactions on Networking (TON)
A Generic Model for Fault Isolation in IntegratedManagement Systems
Journal of Network and Systems Management
DRDB: a distributed real-time database server for high-assurance time-critical applications
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation
WIAPP '99 Proceedings of the 1999 IEEE Workshop on Internet Applications
High speed and robust event correlation
IEEE Communications Magazine
Call Forwarding-Based Active Probing for POTS Fault Isolation
Journal of Network and Systems Management
Towards fault isolation in WDM mesh networks
APNOMS'09 Proceedings of the 12th Asia-Pacific network operations and management conference on Management enabling the future internet for changing business and new computing services
Behavioural Proximity Discovery: an adaptive approach for root cause analysis
International Journal of Business Intelligence and Data Mining
Hi-index | 0.24 |
Many timing constraint (or real-time) distributed systems, such as real-time database systems, are now being used in safety critical applications. However, they are subject to system failures caused by the malfunction of underlying network components. Without the helps of network experts or sophisticated management tools, most users cannot resolve these network problems by themselves. Sometimes, worse, it is usually prohibited to use these management tools, e.g. the 'ping' command, for the security sake. Accordingly, we develop a management system to automate network fault identification merely based on the analysis of the abnormal events from the monitored timing constraint distributed system. In this system, a fault identification framework is designed to identify automatically faulty network elements by using a two-level fault propagation model which combines Timing Constraint Petri nets with an alarm clustering mechanism. In addition, the concepts of redundant/ringleader alarms and innocent network elements are also introduced into the framework to obtain an effective diagnosis. At last, the management system is implemented according to the framework to demonstrate the performance of our fault identification.