The Evaluation Measure of Text Clustering for the Variable Number of Clusters

  • Authors:
  • Taeho Jo;Malrey Lee

  • Affiliations:
  • Advanced Graduate Education Center of Jeonbuk for Electronics and Information Technology-BK21,;The Research Center of Industrial Technology, School of Electronics & Information Engineering, ChonBuk National University, 664-14, 1Ga, DeokJin-Dong, JeonJu, ChonBuk, 561-756, South Korea

  • Venue:
  • ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.