Feature Selection Analysis for Maximum Entropy-Based WSD

  • Authors:
  • Armando Suárez;Manuel Palomar

  • Affiliations:
  • -;-

  • Venue:
  • CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised learning on a corpus-based Word Sense Disambiguation (WSD) system uses a previously classified set of linguistic contexts. In order to perform the training of the system, it is usual to define a set of functions that inform of any linguistic feature in each example. It is usual to look for the same kind of information for each word too, at least on words of the same part-of-speech.In this paper, a study of feature selection in a supervised learning method of WSD based on corpus, Maximum Entropy conditional probability models, is presented. For a few words selected from the DSO corpus, the behaviour of several types of features has been analyzed in order to identify their contribution to gains in accuracy and to determine the influence of sense frequency in that corpus. This paper shows that not all words are better disambiguated with the same combination of features. Moreover, an improved definition of features in order to increase efficiency is presented as well.