Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing cross-language query expansion techniques by degrading translation resources
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual Information Retrieval Based on Document Alignment Techniques
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Building parallel corpora by automatic title alignment using length-based and text-based approaches
Information Processing and Management: an International Journal
Mining comparable bilingual text corpora for cross-language information integration
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Creating and exploiting a comparable corpus in cross-language information retrieval
ACM Transactions on Information Systems (TOIS)
Focused web crawling in the acquisition of comparable corpora
Information Retrieval
Hamshahri: A standard Persian text collection
Knowledge-Based Systems
Topic based creation of a persian-english comparable corpus
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Multilingual corpora are valuable resources for cross-language information retrieval and are available in many language pairs. However the Persian language does not have rich multilingual resources due to some of its special features and difficulties in constructing the corpora. In this study, we build a Persian-English comparable corpus from two independent news collections: BBC News in English and Hamshahri news in Persian. We use the similarity of the document topics and their publication dates to align the documents in these sets. We tried several alternatives for constructing the comparable corpora and assessed the quality of the corpora using different criteria. Evaluation results show the high quality of the aligned documents and using the Persian-English comparable corpus for extracting translation knowledge seems promising.