CWC: A Clustering-Based Feature Weighting Approach for Text Classification

  • Authors:
  • Lin Zhu;Jihong Guan;Shuigeng Zhou

  • Affiliations:
  • Department of Computer Science and Engineering, Fudan University, 200433, China;Department of Computer Science and Technology, Tongji University, 201804, China;Department of Computer Science and Engineering, Fudan University, 200433, China

  • Venue:
  • MDAI '07 Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most existing text classification methods use the vector space model to represent documents, and the document vectors are evaluated by the TF-IDFmethod. However, TF-IDFweighting does not take into account the fact that the weight of a feature in a document is related not only to the document, but also to the class that document belongs to. In this paper, we present a Clustering-based feature Weighting approach for text Classification, or CWCfor short. CWCtakes each class in the training collection as a known cluster, and searches for feature weights iteratively to optimize the clustering objective function, so the best clustering result is achieved, and documents in different classes can be best distinguished by using the resulting feature weights. Performance of CWCis validated by conducting classification over two real text collections, and experimental results show that CWCoutperforms the traditional KNN.