QAlign: a new method for bilingual lexicon extraction from comparable corpora

Authors:
Amir Hazem;Emmanuel Morin
Affiliations:
Laboratore d'Informatique de Nantes-Atlantique (LINA), Université de Nantes, Nantes Cedex 3, France;Laboratore d'Informatique de Nantes-Atlantique (LINA), Université de Nantes, Nantes Cedex 3, France
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Year:
2012

Citing 16
Cited 0

Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Introduction to the special issue on computational linguistics using large corpora

Computational Linguistics - Special issue on using large corpora: I
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Looking for candidate translational equivalents in specialized, comparable corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
FERRET: interactive question-answering for real-world environments

COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Question classification using head words and their hypernyms

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
French-english terminology extraction from comparable corpora

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Relevance measures for question answering, the LIA at QA@CLEF-2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new way of looking at the problem of bilingual lexicon extraction from comparable corpora, mainly inspired from information retrieval (IR) domain and more specifically, from question-answering systems (QAS). By analogy to QAS, we consider a word to be translated as a part of a question extracted from a source language, and we try to find out the correct translation assuming that it is contained in the correct answer of that question extracted from the target language. The methods traditionally dedicated to the task of bilingual lexicon extraction from comparable corpora tend to represent the whole contexts of a word in a single vector and thus, give a general representation of all its contexts. We believe that a local representation of the contexts of a word, given by a window that corresponds to the query, is more appropriate as we give more importance to local information that could be swallowed up in the volume if represented and treated in a single whole context vector. We show that the empirical results obtained are competitive with the standard approach traditionally dedicated to this task.