Sharing experiences to learn user characteristics in dynamic environments with sparse data
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Proceedings of the 25th international conference on Machine learning
Qualitative classification of descent phases in commercial flight data
International Journal of Computational Intelligence Studies
Learning and multiagent reasoning for autonomous agents
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Multiple instance learning via margin maximization
Applied Numerical Mathematics
Combining finite learning automata with GSAT for the satisfiability problem
Engineering Applications of Artificial Intelligence
Adaptive system anomaly prediction for large-scale hosting infrastructures
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Predicting disk failures with HMM- and HSMM-based approaches
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Consensus self-organized models for fault detection (COSMO)
Engineering Applications of Artificial Intelligence
Finding soon-to-fail disks in a haystack
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Multiple-instance learning as a classifier combining problem
Pattern Recognition
A reliability optimization method for RAID-structured storage systems based on active data migration
Journal of Systems and Software
A comparison of machine learning algorithms for proactive hard disk drive failure detection
Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems
Hi-index | 0.00 |
We compare machine learning methods applied to a difficult real-world problem: predicting computer hard-drive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametrically-distributed data. We develop a new algorithm based on the multiple-instance learning framework and the naive Bayesian classifier (mi-NB) which is specifically designed for the low false-alarm case, and is shown to have promising performance. Other methods compared are support vector machines (SVMs), unsupervised clustering, and non-parametric statistical tests (rank-sum and reverse arrangements). The failure-prediction performance of the SVM, rank-sum and mi-NB algorithm is considerably better than the threshold method currently implemented in drives, while maintaining low false alarm rates. Our results suggest that nonparametric statistical tests should be considered for learning problems involving detecting rare events in time series data. An appendix details the calculation of rank-sum significance probabilities in the case of discrete, tied observations, and we give new recommendations about when the exact calculation should be used instead of the commonly-used normal approximation. These normal approximations may be particularly inaccurate for rare event problems like hard drive failures.