Extract semantic information from Wordnet to improve text classification performance

  • Authors:
  • Rujiang Bai;Xiaoyue Wang;Junhua Liao

  • Affiliations:
  • Shandong University of Technology, Zibo, China;Shandong University of Technology, Zibo, China;Shandong University of Technology, Zibo, China

  • Venue:
  • AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual documents according to their meaning. In this paper, we propose a Concept-based Vector Space Model which reflects the more abstract version of the semantic information instead of the Vector Space Model for the text. This model adjusts the weight of the Vector Space by importing the hypernymy-hyponymy relation between synonymy sets and the Concept Chain in the WordNet. Experimental results on several data sets show that the proposed approach, conception built from Wordnet, can achieve significant improvements with respect to the baseline algorithm.