Five languages are better than one: an attempt to bypass the data acquisition bottleneck for WSD

  • Authors:
  • Els Lefever;Véronique Hoste;Martine De Cock

  • Affiliations:
  • LT3, Language and Translation Technology Team, University College, Ghent, Belgium,Department of Applied Mathematics and Computer Science, Ghent University, Belgium;Department of Linguistics, Ghent University, Belgium,LT3, Language and Translation Technology Team, University College, Ghent, Belgium;Department of Applied Mathematics and Computer Science, Ghent University, Belgium

  • Venue:
  • CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a multilingual classification-based approach to Word Sense Disambiguation that directly incorporates translational evidence from four other languages. The need of a large predefined monolingual sense inventory (such as WordNet) is avoided by taking a language-independent approach where the word senses are derived automatically from word alignments on a parallel corpus. As a consequence, the task is turned into a cross-lingual WSD task, that consists in selecting the contextually correct translation of an ambiguous target word. In order to evaluate the viability of cross-lingual Word Sense Disambiguation, we built five classifiers with English as an input language and translations in the five supported languages (viz. French, Dutch, Italian, Spanish and German) as classification output. The feature vectors incorporate both local context features as well as translation features that are extracted from the aligned translations. The experimental results confirm the validity of our approach: the classifiers that employ translational evidence outperform the classifiers that only exploit local context information. Furthermore, a comparison with state-of-the-art systems for the same task revealed that our system outperforms all other systems for all five target languages.