Semi-supervised k-means clustering by optimizing initial cluster centers

Authors:
Xin Wang;Chaofei Wang;Junyi Shen
Affiliations:
Department of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China;China Defense Science and Technology Information Center, Beijing, China;Department of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
Venue:
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Year:
2011

Citing 6
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised model-based document clustering: A comparative study

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper explores the usage of labeled data to generate and optimize initial cluster centers for k-means algorithm. It proposes a max-distance search approach in order to find some optimal initial cluster centers from unlabeled data, especially when labeled data can't provide enough initial cluster centers. Experimental results demonstrate the advantages of this method over standard random selection and partial random selection, in which some initial cluster centers come from labeled data while the other come from unlabeled data by random selection.