ParaSense or how to use parallel corpora for word sense disambiguation

  • Authors:
  • Els Lefever;Véronique Hoste;Martine De Cock

  • Affiliations:
  • University College Ghent, Groot-Brittanniëlaan, Gent, Belgium and Ghent University, Krijgslaan, Gent, Belgium;University College Ghent, Groot-Brittanniëlaan, Gent, Belgium and Ghent University, Krijgslaan, Gent, Belgium and Ghent University, Blandijnberg, Gent, Belgium;Ghent University, Krijgslaan, Gent, Belgium

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a set of exploratory experiments for a multilingual classification-based approach to Word Sense Disambiguation. Instead of using a predefined monolingual sense-inventory such as WordNet, we use a language-independent framework where the word senses are derived automatically from word alignments on a parallel corpus. We built five classifiers with English as an input language and translations in the five supported languages (viz. French, Dutch, Italian, Spanish and German) as classification output. The feature vectors incorporate both the more traditional local context features, as well as binary bag-of-words features that are extracted from the aligned translations. Our results show that the ParaSense multilingual WSD system shows very competitive results compared to the best systems that were evaluated on the SemEval-2010 Cross-Lingual Word Sense Disambiguation task for all five target languages.