Distributed Diagnosis of Failures in a Three Tier E-Commerce System

  • Authors:
  • Gunjan Khanna;Ignacio Laguna;Fahad A. Arshad;Saurabh Bagchi

  • Affiliations:
  • Purdue University;Purdue University;Purdue University;Purdue University

  • Venue:
  • SRDS '07 Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

For dependability outages in distributed internet infrastructures, it is often not enough to detect a failure, but it is also required to diagnose it, i.e., to identify its source. Complex applications deployed in multi-tier environments make diagnosis challenging because of fast error propagation, black-box applications, high diagnosis delay, the amount of states that can be maintained, and imperfect diagnostic tests. Here, we propose a probabilistic diagnosis model for arbitrary failures in components of a distributed application. The monitoring system (the Monitor) passively observes the message exchanges between the components and, at runtime, performs a probabilistic diagnosis of the component that was the root cause of a failure. We demonstrate the approach by applying it to the Pet Store J2EE application, and we compare it with Pinpoint by quantifying latency and accuracy in both systems. The Monitor outperforms Pinpoint by achieving comparably accurate diagnosis with higher precision in shorter time.