The Evaluation Measure of Text Clustering for the Variable Number of Clusters

Authors:
Taeho Jo;Malrey Lee
Affiliations:
Advanced Graduate Education Center of Jeonbuk for Electronics and Information Technology-BK21,;The Research Center of Industrial Technology, School of Electronics & Information Engineering, ChonBuk National University, 664-14, 1Ga, DeokJin-Dong, JeonJu, ChonBuk, 561-756, South Korea
Venue:
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Year:
2007

Citing 4
Cited 5

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An investigation of linguistic features and clustering algorithms for topical document clustering

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Table Based Single Pass Algorithm for Clustering News Articles in NewsPage.com

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
W-kmeans: clustering news articles using wordNet

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
A clustering technique for news articles using WordNet

Knowledge-Based Systems
A new overlapping clustering algorithm based on graph theory

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
OClustR: A new graph-based algorithm for overlapping clustering

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.