Theoretical Computer Science
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Eigenspace-based anomaly detection in computer systems
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Failure detection and localization in component based systems by online tracking
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Failure Detection in Large-Scale Internet Services by Principal Subspace Mapping
IEEE Transactions on Knowledge and Data Engineering
PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems
Proceedings of the 7th international conference on Autonomic computing
Hi-index | 0.00 |
Fast and accurate fault detection is becoming an essential component of management software for mission critical systems. A good fault detector makes possible to initiate repair actions quickly, increasing the availability of the system. The contribution of this paper is twofold. First a new concept of supervised and unsupervised monitoring is proposed for system fault detection. We use a statistical method, canonical correlation analysis (CCA), to model the contextual dependencies between system inputs u and internal behavior x. By means of CCA, the space x is transformed into two subsets of variables, which are monitored in a supervised and unsupervised manner respectively. By doing so, our approach can reduce the false alarms resulting from unusual workload changes, and hence achieve high fault detection rate. Second, in order to test the performance of our approach, we simulate a variety of system faults in a real e-commerce application based on the multi-tiered J2EE architecture. Experimental results demonstrate that the CCA based approach can detect injected failures at their early stages when unusual phenomenon is very weak, and hence contribute to enormous time and cost savings in managing large scale distributed systems.