A semantic term weighting scheme for text categorization

  • Authors:
  • Qiming Luo;Enhong Chen;Hui Xiong

  • Affiliations:
  • School of Computer Science and Technology, University of Science and Technology of China, China and MOE-MS Key Laboratory of Multimedia Computing and Communication of USTC, China;School of Computer Science and Technology, University of Science and Technology of China, China and MOE-MS Key Laboratory of Multimedia Computing and Communication of USTC, China;Management Science and Information Systems Department, Rutgers University, USA

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.06

Visualization

Abstract

Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies.