A Naïve bayes approach to cross-lingual word sense disambiguation and lexical substitution

  • Authors:
  • David Pinto;Darnes Vilariño;Carlos Balderas;Mireya Tovar;Beatriz Beltrán

  • Affiliations:
  • Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico

  • Venue:
  • MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing [1]. It is claimed that WSD is essential for those applications that require of language comprehension modules such as search engines, machine translation systems, automatic answer machines, second life agents, etc. Moreover, with the huge amounts of information in Internet and the fact that this information is continuosly growing in different languages, we are encourage to deal with cross-lingual scenarios where WSD systems are also needed. On the other hand, Lexical Substitution (LS) refers to the process of finding a substitute word for a source word in a given sentence. The LS task needs to be approached by firstly disambiguating the source word, therefore, these two tasks (WSD and LS) are somehow related. In this paper, we present a naïve approach to tackle the problem of cross-lingual WSD and cross-lingual lexical substitution. We use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to calculate the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). Two versions of the probabilistic model are tested: unweighted and weighted. The results were compared with those of an international competition, obtaining a good performance.