Leveraging many simple statistical models to adaptively monitor software systems

  • Authors:
  • Mohammad Ahmad Munawar;Paul A. S. Ward

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada;Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada

  • Venue:
  • ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Self-managing systems require continuous monitoring to ensure correct operation. Detailed monitoring is often too costly to use in production. An alternative is adaptive monitoring, whereby monitoring is kept to a minimal level while the system behaves as expected, and the monitoring level is increased if a problem is suspected. To enable such an approach, we must model the system, both at a minimal level to ensure correct operation, and at a detailed level, to diagnose faulty components. To avoid the complexity of developing an explicit model based on the system structure, we employ simple statistical techniques to identify relationships in the monitored data. These relationships are used to characterize normal operation and identify problematic areas. We develop and evaluate a prototype for the adaptive monitoring of J2EE applications. We experiment with 29 different fault scenarios of three general types, and show that we are able to detect the presence of faults in 80% of cases, where all but one instance of non-detection is attributable to a single fault type. We are able to shortlist the faulty component in 65% of cases where anomalies are observed.