Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems

  • Authors:
  • Haifeng Chen;Guofei Jiang;Kenji Yoshihira

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is a major challenge to process the high dimensional measurements for failure detection and localization in large scale computing systems. However, it is observed that in information systems those measurements are usually located in a low dimensional structure that is embedded in the high dimensional space. From this perspective, a novel approach is proposed in this paper to model the geometry of underlying data generation and detect anomalies based on that model. We consider both linear and nonlinear data generation models. Two statistics, the Hotelling $T^2$ and the squared prediction error ($SPE$), are used to reflect data variations within and outside the model. We track the probabilistic density of extracted statistics to monitor the system's health. After a failure has been detected, a localization process is also proposed to find the most suspicious attributes related to the failure. Experimental results on both synthetic data and a real e-commerce application demonstrate the effectiveness of our approach in detecting and localizing failures in computing systems.