Failure prediction based on log files using Random Indexing and Support Vector Machines

Authors:
Ilenia Fronza;Alberto Sillitti;Giancarlo Succi;Mikko Terho;Jelena Vlasenko
Affiliations:
Center for Applied Software Engineering, Faculty of Computer Science, Free University of Bolzano-Bozen, Italy;Center for Applied Software Engineering, Faculty of Computer Science, Free University of Bolzano-Bozen, Italy;Center for Applied Software Engineering, Faculty of Computer Science, Free University of Bolzano-Bozen, Italy;Nokia, Visiokatu, 3, FI-33720 Tampere, Finland;Center for Applied Software Engineering, Faculty of Computer Science, Free University of Bolzano-Bozen, Italy
Venue:
Journal of Systems and Software
Year:
2013

Citing 27
Cited 2

Prediction of generalization ability in learning machines

Prediction of generalization ability in learning machines
The nature of statistical learning theory

The nature of statistical learning theory
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
Robust Classification for Imprecise Environments

Machine Learning
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Proactive Detection of Software Aging Mechanisms in Performance Critical Computers

SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Predicting Rare Events In Temporal Domains

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Support Vector Machines: Training and Applications

Support Vector Machines: Training and Applications
Dynamic syslog mining for network failure monitoring

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Software reliability forecasting by support vector machines with simulated annealing algorithms

Journal of Systems and Software
Using bag-of-concepts to improve the performance of support vector machines in text categorization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
A Survey on Failure Prediction of Large-Scale Server Clusters

SNPD '07 Proceedings of the Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing - Volume 02
Top 10 algorithms in data mining

Knowledge and Information Systems
WilcoxCV

Bioinformatics
Bad Words: Finding Faults in Spirit's Syslogs

CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Techniques for evaluating fault prediction models

Empirical Software Engineering
Failure Prediction in IBM BlueGene/L Event Logs

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
A case-study on using an Automated In-process Software Engineering Measurement and Analysis system in an industrial environment

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Using text mining and sentiment analysis for online forums hotspot detection and forecast

Decision Support Systems
Discovering word senses from text using random indexing

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Predicting computer system failures using support vector machines

WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Predicting failures of computer systems: a case study for a telecommunication system

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Toward Automated Anomaly Identification in Large-Scale Systems

IEEE Transactions on Parallel and Distributed Systems

The dark side of agile software development

Proceedings of the ACM international symposium on New ideas, new paradigms, and reflections on programming and software
A multivariate classification of open source developers

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research problem: The impact of failures on software systems can be substantial since the recovery process can require unexpected amounts of time and resources. Accurate failure predictions can help in mitigating the impact of failures. Resources, applications, and services can be scheduled to limit the impact of failures. However, providing accurate predictions sufficiently ahead is challenging. Log files contain messages that represent a change of system state. A sequence or a pattern of messages may be used to predict failures. Contribution: We describe an approach to predict failures based on log files using Random Indexing (RI) and Support Vector Machines (SVMs). Method: RI is applied to represent sequences: each operation is characterized in terms of its context. SVMs associate sequences to a class of failures or non-failures. Weighted SVMs are applied to deal with imbalanced datasets and to improve the true positive rate. We apply our approach to log files collected during approximately three months of work in a large European manufacturing company. Results: According to our results, weighted SVMs sacrifice some specificity to improve sensitivity. Specificity remains higher than 0.80 in four out of six analyzed applications. Conclusions: Overall, our approach is very reliable in predicting both failures and non-failures.