Relationship-Based Clustering and Visualization for High-Dimensional Data Mining
INFORMS Journal on Computing
Orthogonal nonnegative matrix t-factorizations for clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Hi-index | 0.00 |
Clustering is a fundamental technique in data mining to identify essential group structures in a given data matrix. Traditional clustering methods are one-way clustering, which has however limitations for high-dimensional matrices or matrices with missing values. One possible solution is co-clustering, which does clustering both columns and rows simultaneously. Also auxiliary information over columns or rows is helpful to stabilize/improve the performance of clustering. We propose a new co-clustering approach, which can incorporate auxiliary information on both columns and rows. Our approach is based on a probabilistic model, for which we present an efficient method for estimating parameters, based on variational Bayesian learning. Our problem setting can be semi-supervised, by which our approach can be applied to various data mining applications. We evaluated the performance of the proposed approach by using both synthetic and real datasets, confirming the clear advantage of incorporating auxiliary information as well as of our method over two competing methods.