How to keep your head above water while detecting errors

  • Authors:
  • Ignacio Laguna;Fahad A. Arshad;David M. Grothe;Saurabh Bagchi

  • Affiliations:
  • Purdue University;Purdue University;Purdue University;Purdue University

  • Venue:
  • Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today's distributed systems need runtime error detection to catch errors arising from software bugs, hardware errors, or unexpected operating conditions. A prominent class of error detection techniques operates in a stateful manner, i.e., it keeps track of the state of the application being monitored and then matches state-based rules. Large-scale distributed applications generate a high volume of messages that can overwhelm the capacity of a stateful detection system. An existing approach to handle this is to randomly sample the messages and process a subset. However, this approach, leads to non-determinism with respect to the detection system's view of what state the application is in. This in turn leads to degradation in the quality of detection. We present an intelligent sampling algorithm and a Hidden Markov Model (HMM)-based algorithm to select the messages that the detection system processes and determine the application states such that the non-determinism is minimized. We also present a mechanism for selectively triggering computationally intensive rules based on a light-weight mechanism to determine if the rule is likely to be flagged. We demonstrate the techniques in a detection system called Monitor applied to a J2EE multi-tier application. We empirically evaluate the performance of Monitor under different load conditions and error scenarios and compare it to a previous system called Pinpoint.