Fault Detection in Distributed Systems by Representative Subspace Mapping

  • Authors:
  • Haifeng Chen;Guofei Jiang;Kenji Yoshihira

  • Affiliations:
  • NEC Laboratories America, Inc., NJ;NEC Laboratories America, Inc., NJ;NEC Laboratories America, Inc., NJ

  • Venue:
  • ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The high dimensionality of system observation, together with the frequent changes of system normal behavior resulting from workload variations, makes fault detection very difficult in distributed computing systems. This paper addresses these issues by proposing a novel statistical technique, the principal canonical correlation analysis (PCCA), and applying it to monitor the system in a supervised manner. Given a set of input variables u and system measurements x, PCCA extracts a subspace x(1) from x that is not only highly correlated with the input u, but also a significant representative of the whole distribution of x. Such property of PCCA, which combines the strengths of both PCA and CCA, is beneficial to the fault detection task. Experimental results from a real e-commerce system based on the multi-tiered J2EE architecture demonstrate the effectiveness of PCCA.