Probabilistic fault diagnosis in communication systems through incremental hypothesis updating
Computer Networks: The International Journal of Computer and Telecommunications Networking
Multidomain Diagnosis of End-to-End Service Failures in Hierarchically Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Virtual ad hoc network testbeds for high fidelity testing of tactical network applications
MILCOM'09 Proceedings of the 28th IEEE conference on Military communications
Cross-layer cluster-based data dissemination for failure detection in MANETs
Proceedings of the 7th International Conference on Network and Services Management
Hi-index | 0.00 |
This paper describes an approach to root cause analysis and fault correlation that addresses the problems inherent in wireless military networks. Root cause analysis concerns itself with identifying and correcting problems in a network. The ultimate goal of root cause analysis is to diagnose the cause for network anomalies, towards the ultimate goal of ensuring that adequate communication functionality is maintained to support the requirements of the network users. In a wired network, the diagnosis of faults is easier due to the existence of a fixed, wired topology and fixed, wired links. In the wireless networks at the tactical edge of military networks, there is no hardwired connectivity, yet there are also expectations on the network from the end users which place constraints on the operation of the network. Fault diagnosis in such networks is fundamentally different from that in wired networks. The performance of the network must be managed explicitly with respect to its user expectations, even though the network connectivity is dynamic, the network monitoring traffic must traverse the (possibly failing) network itself, and the "correct" behavior of the network against which current network state needs to be compared evolves over time. The novel features of our solution that distinguish it from existing root cause analysis techniques are (a) a dynamic model of fault, performance and security problem propagation in the network that can evolve as the definition of network correctness changes, (b) a method for distributing reasoning over this model throughout the network into independent Correlators that share information through a set of Clearinghouses to provide a global root cause correlation capability, and (c) the ability for the Correlator and Clearinghouse reasoning to adapt gracefully when network problems prevent full exchange of information required for root cause analysis.