PAL: Propagation-aware Anomaly Localization for cloud hosted distributed applications

  • Authors:
  • Hiep Nguyen;Yongmin Tan;Xiaohui Gu

  • Affiliations:
  • North Carolina State University;North Carolina State University;North Carolina State University

  • Venue:
  • SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Distributed applications running inside cloud are prone to performance anomalies due to various reasons such as insufficient resource allocations, unexpected workload increases, or software bugs. However, those applications often consist of multiple interacting components where one component anomaly may cause its dependent components to exhibit anomalous behavior as well. It is challenging to identify the faulty components among numerous distributed application components. In this paper, we present a Propagation-aware Anomaly Localization (PAL) system that can pinpoint the source faulty components in distributed applications by extracting anomaly propagation patterns. PAL provides a robust critical change point discovery algorithm to accurately capture the onset of anomaly symptoms at different application components. We then derive the propagation pattern by sorting all critical change points in chronological order. PAL is completely application-agnostic and non-intrusive, which only relies on system-level metrics. We have implemented PAL on top of the Xen platform and tested it on a production cloud computing infrastructure using the RUBiS online auction benchmark application and the IBM System S data streaming processing application with a range of common software bugs. Our experimental results show that PAL can pinpoint faulty components in distributed applications with high accuracy and low overhead.