On Clustering Validation Techniques
Journal of Intelligent Information Systems
DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Failure Diagnosis Using Decision Trees
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Mining Logs Files for Computing System Management
ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Have things changed now?: an empirical study of bug characteristics in modern open source software
Proceedings of the 1st workshop on Architectural and system support for improving software dependability
Failure Prediction in IBM BlueGene/L Event Logs
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Hi-index | 0.00 |
Recently automatic system management has attracted much attention on mining system log files for anomaly detection, diagnosis and prediction. An important problem in this area is mining hot clusters of similar anomalies for system management. A hot anomaly cluster is defined as a largest-sized group of similar anomalies, whose similarity satisfies some user-specified constraints. While, some major anomalies have common symptoms and are shared by several hot clusters, these clusters do not have to be disjoint. So this problem could not be easily solved by existing clustering algorithms, such as k-means and EM. In this paper we propose a novel heuristic clustering algorithm, named Hot Clustering (HC), for mining these patterns. The key idea of HC is to group neighboring anomalies into hot clusters based on some heuristic rules. To validate our approach, we perform the experiment on bug reports from Bugzilla database by k-means, EM and HC. The experimental results show that our approach is both efficient and effective for this problem.