Mining a Persian-English comparable corpus for cross-language information retrieval

Authors:
Homa B. Hashemi;Azadeh Shakery
Affiliations:
-;-
Venue:
Information Processing and Management: an International Journal
Year:
2014

Citing 34
Cited 0

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-linguistic information retrieval workshop

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasal translation and query expansion techniques for cross-language information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing cross-language query expansion techniques by degrading translation resources

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Creating and exploiting a comparable corpus in cross-language information retrieval

ACM Transactions on Information Systems (TOIS)
Focused web crawling in the acquisition of comparable corpora

Information Retrieval
Mining named entity transliteration equivalents from comparable corpora

Proceedings of the 17th ACM conference on Information and knowledge management
Probabilistic score propagation in information retrieval

Probabilistic score propagation in information retrieval
Hamshahri: A standard Persian text collection

Knowledge-Based Systems
Feature-based method for document alignment in comparable news corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Extracting bilingual dictionary from comparable corpora with dependency heterogeneity

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Persian Language, Is Stemming Efficient?

DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
Exploiting comparable corpora with TER and TERp

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Cross language experiments at Persian@CLEF 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Overview of CLEF 2008 INFILE pilot track

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Brains, not brawn: The use of “smart” comparable corpora in bilingual terminology mining

ACM Transactions on Speech and Language Processing (TSLP)
A cross-lingual framework for monolingual biomedical information retrieval

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Exploiting comparable corpora for cross-language information retrieval

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Using comparable corpora to improve the effectiveness of cross-language information retrieval

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Creating a Persian-English comparable corpus

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Topic based creation of a persian-english comparable corpus

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowledge acquisition and bilingual terminology extraction from multilingual corpora are challenging tasks for cross-language information retrieval. In this study, we propose a novel method for mining high quality translation knowledge from our constructed Persian-English comparable corpus, University of Tehran Persian-English Comparable Corpus (UTPECC). We extract translation knowledge based on Term Association Network (TAN) constructed from term co-occurrences in same language as well as term associations in different languages. We further propose a post-processing step to do term translation validity check by detecting the mistranslated terms as outliers. Evaluation results on two different data sets show that translating queries using UTPECC and using the proposed methods significantly outperform simple dictionary-based methods. Moreover, the experimental results show that our methods are especially effective in translating Out-Of-Vocabulary terms and also expanding query words based on their associated terms.