QAlign: a new method for bilingual lexicon extraction from comparable corpora

  • Authors:
  • Amir Hazem;Emmanuel Morin

  • Affiliations:
  • Laboratore d'Informatique de Nantes-Atlantique (LINA), Université de Nantes, Nantes Cedex 3, France;Laboratore d'Informatique de Nantes-Atlantique (LINA), Université de Nantes, Nantes Cedex 3, France

  • Venue:
  • CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a new way of looking at the problem of bilingual lexicon extraction from comparable corpora, mainly inspired from information retrieval (IR) domain and more specifically, from question-answering systems (QAS). By analogy to QAS, we consider a word to be translated as a part of a question extracted from a source language, and we try to find out the correct translation assuming that it is contained in the correct answer of that question extracted from the target language. The methods traditionally dedicated to the task of bilingual lexicon extraction from comparable corpora tend to represent the whole contexts of a word in a single vector and thus, give a general representation of all its contexts. We believe that a local representation of the contexts of a word, given by a window that corresponds to the query, is more appropriate as we give more importance to local information that could be swallowed up in the volume if represented and treated in a single whole context vector. We show that the empirical results obtained are competitive with the standard approach traditionally dedicated to this task.