Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Mining Partially Periodic Event Patterns with Unknown Periods
Proceedings of the 17th International Conference on Data Engineering
Diagnosing network-wide traffic anomalies
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Dynamic syslog mining for network failure monitoring
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Towards informatic analysis of syslogs
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
YALE: rapid prototyping for complex data mining tasks
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
What Supercomputers Say: A Study of Five System Logs
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Discovering actionable patterns in event data
IBM Systems Journal
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Visual and algorithmic tooling for system trace analysis: a case study
ACM SIGOPS Operating Systems Review
Systematically improving the quality of IT utilization data
ACM SIGMETRICS Performance Evaluation Review
Analysis of execution log files
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Mochi: visual log-analysis based tools for debugging hadoop
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Mining invariants from console logs for system problem detection
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Hunting for problems with Artemis
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
LogSig: generating system events from raw textual logs
Proceedings of the 20th ACM international conference on Information and knowledge management
In-situ MapReduce for log processing
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Discovering lag intervals for temporal dependencies
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Searching similar segments over textual event sequences
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Adapting system execution traces to support analysis of software system performance properties
Journal of Systems and Software
Structured and Interoperable Logging for the Cloud Computing Era: The Pitfalls and Benefits
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
The console logs generated by an application contain messages that the application developers believed would be useful in debugging or monitoring the application. Despite the ubiquity and large size of these logs, they are rarely exploited in a systematic way for monitoring and debugging because they are not readily machine-parsable. In this paper, we propose a novel method for mining this rich source of information. First, we combine log parsing and text mining with source code analysis to extract structure from the console logs. Second, we extract features from the structured information in order to detect anomalous patterns in the logs using Principal Component Analysis (PCA). Finally, we use a decision tree to distill the results of PCA-based anomaly detection to a format readily understandable by domain experts (e.g. system operators) who need not be familiar with the anomaly detection algorithms. As a case study, we distill over one million lines of console logs from the Hadoop file system to a simple decision tree that a domain expert can readily understand; the process requires no operator intervention and we detect a large portion of runtime anomalies that are commonly overlooked.