A Naïve bayes approach to cross-lingual word sense disambiguation and lexical substitution

Authors:
David Pinto;Darnes Vilariño;Carlos Balderas;Mireya Tovar;Beatriz Beltrán
Affiliations:
Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico;Faculty of Computer Science, B. Autonomous University of Puebla, Mexico
Venue:
MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
Year:
2010

Citing 4
Cited 0

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
SemEval-2007 task 10: English lexical substitution task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval-2010 task 2: Cross-lingual lexical substitution

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
SemEval-2010 task 3: Cross-lingual word sense disambiguation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing [1]. It is claimed that WSD is essential for those applications that require of language comprehension modules such as search engines, machine translation systems, automatic answer machines, second life agents, etc. Moreover, with the huge amounts of information in Internet and the fact that this information is continuosly growing in different languages, we are encourage to deal with cross-lingual scenarios where WSD systems are also needed. On the other hand, Lexical Substitution (LS) refers to the process of finding a substitute word for a source word in a given sentence. The LS task needs to be approached by firstly disambiguating the source word, therefore, these two tasks (WSD and LS) are somehow related. In this paper, we present a naïve approach to tackle the problem of cross-lingual WSD and cross-lingual lexical substitution. We use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to calculate the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). Two versions of the probabilistic model are tested: unweighted and weighted. The results were compared with those of an international competition, obtaining a good performance.