Clustering event logs using iterative partitioning
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A survey of online failure prediction methods
ACM Computing Surveys (CSUR)
Predictive algorithms and technologies for availability enhancement
ISAS'08 Proceedings of the 5th international conference on Service availability
Predicting disk failures with HMM- and HSMM-based approaches
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
IBM Journal of Research and Development
Predictive combinations of monitor alarms preceding in-hospital code blue events
Journal of Biomedical Informatics
Failure prediction for HPC systems and applications: Current situation and open issues
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
A proactive handling of faults requires that the risk of upcoming failures is continuously assessed. One of the promising approaches is online failure prediction, which means that the current state of the system is evaluated in order to predict the occurrence of failures in the near future. More specifically, we focus on methods that use event-driven sources such as errors. We use Hidden Semi- Markov Models (HSMMs) for this purpose and demonstrate effectiveness based on field data of a commercial telecommunication system. For comparative analysis we selected three well-known failure prediction techniques: a straightforward method that is based on a reliability model, Dispersion Frame Technique by Lin and Siewiorek and the eventset-based method introduced by Vilalta et al. We assess and compare the methods in terms of precision, recall, F-measure, false-positive rate, and computing time. The experiments suggest that our HSMM approach is very effective with respect to online failure prediction.