Comparative Advantage Approach for Sparse Text Data Clustering

Authors:
Jie Ji;Tony Y. T. Chan;Qiangfu Zhao
Affiliations:
-;-;-
Venue:
CIT '09 Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02
Year:
2009

Citing 0
Cited 1

Fast document clustering based on weighted comparative advantage

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document clustering is the process of partitioninga set of unlabeled n documents into clusters such that documentsin each cluster share some common concepts. Eachconcept is conveniently represented by some key terms. Usingwords as features, text data are represented as a vector in avery high dimensional vector space. However, most documentsare sparse vectors, for example, more than ten thousanddimensions and sparsity of 98%. In this paper, we study afast classification algorithm based on the idea of comparativeadvantage for clustering sparse data. The proposed algorithmuses one “ruler” instead of k centers to identify the comparativeadvantage of each cluster and define the cluster label foreach document. Experimental results show that our algorithmhas comparable performance but faster than k-means. It canproduce clusters with smaller overlapping concepts in the senseof key terms.