A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

  • Authors:
  • Man Lan;Chew-Lim Tan;Hwee-Boon Low;Sam-Yuan Sung

  • Affiliations:
  • Institute for Infocomm Research, Singapore;National University of Singapore, Singapore;Institute for Infocomm Research, Singapore;National University of Singapore, Singapore

  • Venue:
  • WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital step in automatic text categorization. In this paper, we conducted comprehensive experiments to compare various term weighting schemes with SVM on two widely-used benchmark data sets. We also presented a new term weighting scheme tf-rf to improve the term's discriminating power. The controlled experimental results showed that this newly proposed tf-rf scheme is significantly better than other widely-used term weighting schemes. Compared with schemes related with tf factor alone, the idf factor does not improve or even decrease the term's discriminating power for text categorization.