A Linear Text Classification Algorithm Based on Category Relevance Factors

  • Authors:
  • Zhi-Hong Deng;Shi-Wei Tang;Dong-Qing Yang;Ming Zhang;Xiao-Bin Wu;Meng Yang

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a linear text classification algorithm called CRF. By using category relevance factors, CRF computes the feature vectors of training documents belonging to the same category. Based on these feature vectors, CRF induces the profile vector of each category. For new unlabelled documents, CRF adopts a modified cosine measure to obtain similarities between these documents and categories and assigns them to categories that have the biggest similarity scores. In CRF, it is profile vectors not vectors of all training documents that join in computing the similarities between documents and categories. We evaluated our algorithm on a subset of Reuters-21578 and 20_newsgroups text collections and compared it against k-NN and SVM. Experimental results show that CRF outperforms k-NN and is competitive with SVM.