The Complexity of Some Problems on Subsequences and Supersequences
Journal of the ACM (JACM)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text classification using string kernels
The Journal of Machine Learning Research
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
An integrated framework on mining logs files for computing system management
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Towards informatic analysis of syslogs
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
What Supercomputers Say: A Study of Five System Logs
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Event summarization for system management
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Constructing comprehensive summaries of large event sequences
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Alert Detection in System Logs
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Clustering event logs using iterative partitioning
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Discovering actionable patterns in event data
IBM Systems Journal
Incremental learning of system log formats
ACM SIGOPS Operating Systems Review
An algorithmic approach to event summarization
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining console logs for large-scale system problem detection
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
LogTree: A Framework for Generating System Events from Raw Textual Logs
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Data summarization model for user action log files
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
An integrated framework for optimizing automatic monitoring systems in large IT infrastructures
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Searching similar segments over textual event sequences
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Modern computing systems generate large amounts of log data. System administrators or domain experts utilize the log data to understand and optimize system behaviors. Most system logs are raw textual and unstructured. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Log messages are relatively short text messages but may have a large vocabulary, which often result in poor performance when applying traditional text clustering techniques to the log data. Other related methods have various limitations and only work well for some particular system logs. In this paper, we propose a message signature based algorithm logSig to generate system events from textual log messages. By searching the most representative message signatures, logSig categorizes log messages into a set of event types. logSig can handle various types of log data, and is able to incorporate human's domain knowledge to achieve a high performance. We conduct experiments on five real system log data. Experiments show that logSig outperforms other alternative algorithms in terms of the overall performance.