A language modeling approach for extracting translation knowledge from comparable corpora

Authors:
Razieh Rahimi;Azadeh Shakery
Affiliations:
School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran;School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
Venue:
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Year:
2013

Citing 18
Cited 0

Experiments in multilingual information retrieval using the SPIDER system

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Looking for candidate translational equivalents in specialized, comparable corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Creating and exploiting a comparable corpus in cross-language information retrieval

ACM Transactions on Information Systems (TOIS)
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
On the use of comparable corpora to improve SMT performance

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Clustering comparable corpora for bilingual lexicon extraction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Topic based creation of a persian-english comparable corpus

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Detecting highly confident word translations from comparable corpora without any prior knowledge

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

A main challenge in Cross-Language information retrieval is to estimate a translation language model, as its quality directly affects the retrieval performance. The translation language model is built using translation resources such as bilingual dictionaries, parallel corpora, or comparable corpora. In general, high quality resources may not be available for scarce-resource languages. For these languages, efficient exploitation of commonly available resources such as comparable corpora is considered more crucial. In this paper, we focus on using only comparable corpora to extract translation information more efficiently. We propose a language modeling approach for estimating the translation language model. The proposed method is based on probability distribution estimation, and can be tuned easier in comparison with heuristically adjusted previous work. Experiment results show a significant improvement in the translation quality and CLIR performance compared to the previous approaches.