Issues with and approaches to network monitoring and problem remediation in military tactical networks

  • Authors:
  • Marian Nodine;Donald Baker;Ritu Chadha;Cho-Yu Jason Chiang;Kimberly Moeltner;Thomas D'Silva;Yogeeta Kumar

  • Affiliations:
  • Telcordia Technologies, Inc., Piscataway, NJ;Telcordia Technologies, Inc., Piscataway, NJ;Telcordia Technologies, Inc., Piscataway, NJ;Telcordia Technologies, Inc., Piscataway, NJ;U.S. Army CERDEC, Fort Monmouth, NJ;U.S. Army CERDEC, Fort Monmouth, NJ;U.S. Army CERDEC, Fort Monmouth, NJ

  • Venue:
  • MILCOM'09 Proceedings of the 28th IEEE conference on Military communications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an approach to root cause analysis and fault correlation that addresses the problems inherent in wireless military networks. Root cause analysis concerns itself with identifying and correcting problems in a network. The ultimate goal of root cause analysis is to diagnose the cause for network anomalies, towards the ultimate goal of ensuring that adequate communication functionality is maintained to support the requirements of the network users. In a wired network, the diagnosis of faults is easier due to the existence of a fixed, wired topology and fixed, wired links. In the wireless networks at the tactical edge of military networks, there is no hardwired connectivity, yet there are also expectations on the network from the end users which place constraints on the operation of the network. Fault diagnosis in such networks is fundamentally different from that in wired networks. The performance of the network must be managed explicitly with respect to its user expectations, even though the network connectivity is dynamic, the network monitoring traffic must traverse the (possibly failing) network itself, and the "correct" behavior of the network against which current network state needs to be compared evolves over time. The novel features of our solution that distinguish it from existing root cause analysis techniques are (a) a dynamic model of fault, performance and security problem propagation in the network that can evolve as the definition of network correctness changes, (b) a method for distributing reasoning over this model throughout the network into independent Correlators that share information through a set of Clearinghouses to provide a global root cause correlation capability, and (c) the ability for the Correlator and Clearinghouse reasoning to adapt gracefully when network problems prevent full exchange of information required for root cause analysis.