Network-Based Problem Detection for Distributed Systems

  • Authors:
  • Hisashi Kashima;Tadashi Tsumura;Tsuyoshi Ide;Takahide Nogayama;Ryo Hirade;Hiroaki Etoh;Takeshi Fukuda

  • Affiliations:
  • IBM Tokyo Research Laboratory;IBM Tokyo Research Laboratory;IBM Tokyo Research Laboratory;IBM Tokyo Research Laboratory;IBM Tokyo Research Laboratory;IBM Tokyo Research Laboratory;IBM Tokyo Research Laboratory

  • Venue:
  • ICDE '05 Proceedings of the 21st International Conference on Data Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a network-based problem detection framework for distributed systems, which includes a data-mining method for discovering dynamic dependencies among distributed services from transaction data collected from network, and a novel problem detection method based on the discovered dependencies. From observed containments of transaction execution time periods, we estimate the probabilities of accidental and non-accidental containments, and build a competitive model for discovering direct dependencies by using a model estimation method based on the online EM algorithm. Utilizing the discovered dependency information, we also propose a hierarchical problem detection framework, where microscopic dependency information is incorporated with a macroscopic anomaly metric that monitors the behavior of the system as a whole. This feature is made possible by employing a network-based design which provides overall information of the system without any impact on the performance.