A clustering model based on matrix approximation with applications to cluster system log files

Authors:
Tao Li;Wei Peng
Affiliations:
School of Computer Science, Florida International University, Miami, FL;School of Computer Science, Florida International University, Miami, FL
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 4
Cited 1

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Autonomic Self-Healing Systems in a Cross-Product IT Environment

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Towards informatic analysis of syslogs

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing

End-to-end framework for fault management for open source clusters: Ranger

Proceedings of the 2010 TeraGrid Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In system management applications, to perform automated analysis of the historical data across multiple components when problems occur, we need to cluster the log messages with disparate formats to automatically infer the common set of semantic situations and obtain a brief description for each situation. In this paper, we propose a clustering model where the problem of clustering is formulated as matrix approximations and the clustering objective is minimizing the approximation error between the original data matrix and the reconstructed matrix based on the cluster structures. The model explicitly characterizes the data and feature memberships and thus enables the descriptions of each cluster. We present a two-side spectral relaxation optimization procedure for the clustering model. We also establish the connections between our clustering model with existing approaches. Experimental results show the effectiveness of the proposed approach.