Combining automatic and manual index representations in probabilistic retrieval
Journal of the American Society for Information Science
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A new method of weighting query terms for ad-hoc retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Employing the resolution power of search keys
Journal of the American Society for Information Science and Technology
Comparing cross-language query expansion techniques by degrading translation resources
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
UTACLIR -: general query translation framework for several language pairs
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology
Focused web crawling in the acquisition of comparable corpora
Information Retrieval
ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Effects of aligned corpus quality and size in corpus-based CLIR
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Multilingual pseudo-relevance feedback: performance study of assisting languages
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Creating a Persian-English comparable corpus
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Machine transliteration survey
ACM Computing Surveys (CSUR)
Mining large-scale comparable corpora from Chinese-English news collections
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Topic based creation of a persian-english comparable corpus
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval
Journal of the American Society for Information Science and Technology
Termhood-Based comparability metrics of comparable corpus in special domain
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
A language modeling approach for extracting translation knowledge from comparable corpora
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Finding synonyms and other semantically-similar terms from coselection data
AWC '13 Proceedings of the First Australasian Web Conference - Volume 144
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
We present a method for creating a comparable text corpus from two document collections in different languages. The collections can be very different in origin. In this study, we build a comparable corpus from articles by a Swedish news agency and a U.S. newspaper. The keys with best resolution power were extracted from the documents of one collection, the source collection, by using the relative average term frequency (RATF) value. The keys were translated into the language of the other collection, the target collection, with a dictionary-based query translation program. The translated queries were run against the target collection and an alignment pair was made if the retrieved documents matched given date and similarity score criteria. The resulting comparable collection was used as a similarity thesaurus to translate queries along with a dictionary-based translator. The combined approaches outperformed translation schemes where dictionary-based translation or corpus translation was used alone.