CAPRI: a tool for mining complex line patterns in large log data

Authors:
Farhana Zulkernine;Patrick Martin;Wendy Powley;Sima Soltani;Serge Mankovskii;Mark Addleman
Affiliations:
Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada;CA Labs, Toronto, ON, Canada;CA Technologies Inc., San Francisco
Venue:
Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Year:
2013

Citing 8
Cited 0

Towards informatic analysis of syslogs

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
An automated approach for abstracting execution logs to execution events

Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on Program Comprehension through Dynamic Analysis (PCODA)
LogView: Visualizing Event Log Clusters

PST '08 Proceedings of the 2008 Sixth Annual Conference on Privacy, Security and Trust
Clustering event logs using iterative partitioning

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting large-scale system problems by mining console logs

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Discovering actionable patterns in event data

IBM Systems Journal
Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Towards a Training-Oriented Adaptive Decision Guidance and Support System

ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Log files provide important information for troubleshooting complex systems. However, the structure and contents of the log data and messages vary widely. For automated processing, it is necessary to first understand the layout and the structure of the data, which becomes very challenging when a massive amount of data and messages are reported by different system components in the same log file. Existing approaches apply supervised mining techniques and return frequent patterns only for single line messages. We present CAPRI (type-CAsted Pattern and Rule mIner), which uses a novel pattern mining algorithm to efficiently mine structural line patterns from semi-structured multi-line log messages. It discovers line patterns in a type-casted format; categorizes all data lines; identifies frequent, rare and interesting line patterns, and uses unsupervised learning and incremental mining techniques. It also mines association rules to identify the contextual relationship between two successive line patterns. In addition, CAPRI lists the frequent term and value patterns given the minimum support thresholds. The line and term pattern information can be applied in the next stage to categorize and reformat multi-line data, extract variables from the messages, and discover further correlation among messages for troubleshooting complex systems. To evaluate our approach, we present a comparative study of our tool against some of the existing popular open-source research tools using three different layouts of log data including a complex multi-line log file from the z/OS mainframe system.