Leveraging many simple statistical models to adaptively monitor software systems

  • Authors:
  • Mohammad A. Munawar;Paul A. S. Ward

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Waterloo, 200 University Avenue, Waterloo, Ontario, N2L 3G1, Canada.;Department of Electrical and Computer Engineering, University of Waterloo, 200 University Avenue, Waterloo, Ontario, N2L 3G1, Canada

  • Venue:
  • International Journal of High Performance Computing and Networking
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensuring that a software system meets its objectives requires continuous monitoring. In practice, monitoring is either insufficient to effectively detect and diagnose failures, or is too costly to use in production. An alternative is adaptive monitoring, where the system is monitored at a minimal level to determine system health, and if a problem is suspected, the monitoring level is automatically increased to determine faults. To model the system at different monitoring levels, we employ statistical techniques to identify stable relationships in the monitored data. These relationships characterise normal operation and can help detect anomalies. We describe our approach in the context of a J2EE-based system. We show that adaptive monitoring is a cost-effective alternative to continuous detailed monitoring. We inject 29 different faults, and show that we detect the faults in 80% of cases and shortlist the faulty component in 65% of the detected cases.