C4.5: programs for machine learning
C4.5: programs for machine learning
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Term Weighting Approaches in Automatic Text Retrieval
Term Weighting Approaches in Automatic Text Retrieval
The Journal of Machine Learning Research
Filtering Failure Logs for a BlueGene/L Prototype
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Failure Diagnosis Using Decision Trees
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
An integrated framework on mining logs files for computing system management
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
BlueGene/L Failure Analysis and Prediction Models
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
What Supercomputers Say: A Study of Five System Logs
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Information-theoretic modeling for tracking the health of complex software systems
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Detection and Diagnosis of Recurrent Faults in Software Systems by Invariant Analysis
HASE '08 Proceedings of the 2008 11th IEEE High Assurance Systems Engineering Symposium
Discovering actionable patterns in event data
IBM Systems Journal
Symptom-based problem determination using log data abstraction
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Assisting failure diagnosis through filesystem instrumentation
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
Enterprise software systems (ESS) are becoming larger and increasingly complex. Failure in business-critical systems is expensive, leading to consequences such as loss of critical data, loss of sales, customer dissatisfaction, even law suits. Therefore, detecting failures and diagnosing their root-cause in a timely manner is essential. Many studies suggest that a large fraction of failures encountered in practice are recurrent (i.e., they have been seen before). Fast and accurate detection of these failures can accelerate problem determination, and thereby improve system reliability. To this effect, we explore machine learning techniques, including the Naïve Bayes classifier, partially-supervised learning, and decision trees (using C4.5), to automatically recognize symptoms of recurrent faults and to derive detection rules from samples of log data. This work focuses on log files, since they are readily available and they do not put any additional computational burden on the component generating the data. The methods explored in this work can aid the development of tools to assist support personnel in problem determination tasks. Instead of requiring the operators to manually define patterns for identifying recurrent problems, such tools can be trained using prior, solved and unsolved cases from existing support databases.