Clustering comparable corpora for bilingual lexicon extraction

Authors:
Bo Li;Eric Gaussier;Akiko Aizawa
Affiliations:
UJF-Grenoble/CNRS, France, LIG UMR;UJF-Grenoble/CNRS, France, LIG UMR;National Institute of Informatics, Tokyo, Japan
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 12
Cited 3

Phrasal translation and query expansion techniques for cross-language information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A systematic comparison of various statistical alignment models

Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Extracting bilingual dictionary from comparable corpora with dependency heterogeneity

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Bilingual lexicon generation using non-aligned signatures

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Detecting highly confident word translations from comparable corpora without any prior knowledge

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
BiCWS: mining cognitive differences from bilingual web search results

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
A language modeling approach for extracting translation knowledge from comparable corpora

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illustrate the well-foundedness of this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous approaches.