Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans

  • Authors:
  • Chien-Liang Liu;Tao-Hsing Chang;Hsuan-Hsun Li

  • Affiliations:
  • Information and Communications Research Laboratories, Industrial Technology Research Institute, Rm. 709, Bldg. 51, 195, Sec. 4, Chung Hsing Rd., Chutung, Hsinchu 310, Taiwan, ROC;Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, Chien Kung Campus 415, Chien Kung Road, Kaohsiung 807, Taiwan, ROC;Department of Computer Science, National Chiao Tung University, 1001 University Road, Hsinchu 300, Taiwan, ROC

  • Venue:
  • Fuzzy Sets and Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.20

Visualization

Abstract

While focusing on document clustering, this work presents a fuzzy semi-supervised clustering algorithm called fuzzy semi-Kmeans. The fuzzy semi-Kmeans is an extension of K-means clustering model, and it is inspired by an EM algorithm and a Gaussian mixture model. Additionally, the fuzzy semi-Kmeans provides the flexibility to employ different fuzzy membership functions to measure the distance between data. This work employs Gaussian weighting function to conduct experiments, but cosine similarity function can be used as well. This work conducts experiments on three data sets and compares fuzzy semi-Kmeans with several methods. The experimental results indicate that fuzzy semi-Kmeans can generally outperform the other methods.