A Non-VSM kNN algorithm for text classification

  • Authors:
  • Zhi-Hong Deng;Shi-Wei Tang

  • Affiliations:
  • National Laboratory on Machine Perception, School of Electronics Engineering and Computer Science, Peking University, Beijing, China;National Laboratory on Machine Perception, School of Electronics Engineering and Computer Science, Peking University, Beijing, China

  • Venue:
  • ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

The text classification problem, which is the task of assigning natural language texts to predefined categories based on their content, has been widely studied. Traditional text classification use VSM (Vector Space Model), which views documents as vectors in high dimensional spaces, to represent documents. In this paper, we propose a non-VSM kNN algorithm for text classification. Based on correlations between categories and features, the algorithms first get k F-C tuples, which are the first k tuples in term of correlation value, from an unlabeled document. Then the algorithm predicts the category of the unlabeled documents via these tuples. We have evaluated the algorithm on two document collections and compared it against traditional kNN. Experimental results show that our algorithm outperforms traditional kNN in both efficiency and effectivity.