Improving text classification by a sense spectrum approach to term expansion

  • Authors:
  • Peter Wittek;Sándor Darányi;Chew Lim Tan

  • Affiliations:
  • National University of Singapore, Singapore;Göteborg University & University of Boråås, Borås, Sweden;National University of Singapore, Singapore

  • Venue:
  • CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Experimenting with different mathematical objects for text representation is an important step of building text classification models. In order to be efficient, such objects of a formal model, like vectors, have to reasonably reproduce language-related phenomena such as word meaning inherent in index terms. We introduce an algorithm for sense-based semantic ordering of index terms which approximates Cruse's description of a sense spectrum. Following semantic ordering, text classification by support vector machines can benefit from semantic smoothing kernels that regard semantic relations among index terms while computing document similarity. Adding expansion terms to the vector representation can also improve effectiveness. This paper proposes a new kernel which discounts less important expansion terms based on lexical relatedness.