Mining hot clusters of similar anomalies for system management

Authors:
Dapeng Zhang;Fen Lin;Zhongzhi Shi;Heqing Huang
Affiliations:
Key Lab. of Intelligent Inf. Processing, Inst. of Computing Techn., Chinese Academy of Sciences, Beijing, China and Graduate School of the Chinese Academy of Sciences, Beijing, China and Inst. of ...;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China and Graduate School of the Chinese Academy of Sciences, Beijing ...;Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
Venue:
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Year:
2010

Citing 7
Cited 0

On Clustering Validation Techniques

Journal of Intelligent Information Systems
DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Failure Diagnosis Using Decision Trees

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Mining Logs Files for Computing System Management

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Have things changed now?: an empirical study of bug characteristics in modern open source software

Proceedings of the 1st workshop on Architectural and system support for improving software dependability
Failure Prediction in IBM BlueGene/L Event Logs

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently automatic system management has attracted much attention on mining system log files for anomaly detection, diagnosis and prediction. An important problem in this area is mining hot clusters of similar anomalies for system management. A hot anomaly cluster is defined as a largest-sized group of similar anomalies, whose similarity satisfies some user-specified constraints. While, some major anomalies have common symptoms and are shared by several hot clusters, these clusters do not have to be disjoint. So this problem could not be easily solved by existing clustering algorithms, such as k-means and EM. In this paper we propose a novel heuristic clustering algorithm, named Hot Clustering (HC), for mining these patterns. The key idea of HC is to group neighboring anomalies into hot clusters based on some heuristic rules. To validate our approach, we perform the experiment on bug reports from Bugzilla database by k-means, EM and HC. The experimental results show that our approach is both efficient and effective for this problem.