An integrated framework for optimizing automatic monitoring systems in large IT infrastructures

Authors:
Liang Tang;Tao Li;Larisa Shwartz;Florian Pinel;Genady Ya Grabarnik
Affiliations:
Florida International University, Miami, FL, USA;Florida International University, Miami, FL, USA;IBM, Yorktown, NY, USA;IBM, Yorktown, NY, USA;St. John's University, New York, NY, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 26
Cited 1

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Implementing a Generalized Tool for Network Monitoring

LISA '97 Proceedings of the 11th Conference on Systems Administration
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems (TOCS)
Automatically inferring patterns of resource consumption in network traffic

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Data-driven validation, completion and construction of event relationship networks

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Generic Adapter Logging Toolkit

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Profiling internet backbone traffic: behavior models and applications

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
An integrated framework on mining logs files for computing system management

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Robust Rule-Based Prediction

IEEE Transactions on Knowledge and Data Engineering
InteMon: intelligent system monitoring on large clusters

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The BSD packet filter: a new architecture for user-level packet capture

USENIX'93 Proceedings of the USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 Conference Proceedings
Event summarization for system management

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Constructing comprehensive summaries of large event sequences

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Document-Word Co-regularization for Semi-supervised Sentiment Analysis

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Efficient Constraint Monitoring Using Adaptive Thresholds

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems

ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Discovering actionable patterns in event data

IBM Systems Journal
An algorithmic approach to event summarization

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Experience mining Google's production console logs

SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
LogTree: A Framework for Generating System Events from Raw Textual Logs

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
LogSig: generating system events from raw textual logs

Proceedings of the 20th ACM international conference on Information and knowledge management
Discovering lag intervals for temporal dependencies

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Searching similar segments over textual event sequences

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The competitive business climate and the complexity of IT environments dictate efficient and cost-effective service delivery and support of IT services. These are largely achieved by automating routine maintenance procedures, including problem detection, determination and resolution. System monitoring provides an effective and reliable means for problem detection. Coupled with automated ticket creation, it ensures that a degradation of the vital signs, defined by acceptable thresholds or monitoring conditions, is flagged as a problem candidate and sent to supporting personnel as an incident ticket. This paper describes an integrated framework for minimizing false positive tickets and maximizing the monitoring coverage for system faults. In particular, the integrated framework defines monitoring conditions and the optimal corresponding delay times based on an off-line analysis of historical alerts and incident tickets. Potential monitoring conditions are built on a set of predictive rules which are automatically generated by a rule-based learning algorithm with coverage, confidence and rule complexity criteria. These conditions and delay times are propagated as configurations into run-time monitoring systems. Moreover, a part of misconfigured monitoring conditions can be corrected according to false negative tickets that are discovered by another text classification algorithm in this framework. This paper also provides implementation details of a program product that uses this framework and shows some illustrative examples of successful results.