A novel initialization method for semi-supervised clustering

Authors:
Yanzhong Dang;Zhaoguo Xuan;Lili Rong;Ming Liu
Affiliations:
Institute of Systems Engineering, Dalian University of Technology, Dalian, China;Institute of Systems Engineering, Dalian University of Technology, Dalian, China;Institute of Systems Engineering, Dalian University of Technology, Dalian, China;Institute of Systems Engineering, Dalian University of Technology, Dalian, China
Venue:
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Year:
2010

Citing 8
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Using unlabeled data to improve text classification

Using unlabeled data to improve text classification
A unified framework for model-based clustering

The Journal of Machine Learning Research
Semi-supervised model-based document clustering: A comparative study

Machine Learning
Global Optimization for Semi-supervised K-means

APCIP '09 Proceedings of the 2009 Asia-Pacific Conference on Information Processing - Volume 02
Semi-supervised learning with very few labeled training examples

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Text document clustering based on neighbors

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the research of semi-supervised clustering has been paid more and more attention. For most of the semi-supervised clustering algorithms, a good initialization method can create the high-quality seeds which are helpful to improve the clustering accuracy. In the real world, there are few labeled samples but many unlabeled ones, whereas most of the existing initialization methods put the unlabeled data away for clustering which may contain some potentially useful information for clustering tasks. In this paper, we propose a novel initialization method to transfer some of the unlabeled samples into labeled ones, in which the neighbors of labeled samples are identified at first and then the known labels are propagated to the unlabeled ones. Experimental results show that the proposed initialization method can improve the performance of the semi-supervised clustering.