Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
UTACLIR -: general query translation framework for several language pairs
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Translating unknown queries with web corpora for cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Building parallel corpora by automatic title alignment using length-based and text-based approaches
Information Processing and Management: an International Journal
Report on the TREC 2004 genomics track
ACM SIGIR Forum
FITE-TRT: a high quality translation technique for OOV words
Proceedings of the 2006 ACM symposium on Applied computing
Creating and exploiting a comparable corpus in cross-language information retrieval
ACM Transactions on Information Systems (TOIS)
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments
Proceedings of the second workshop on Analytics for noisy unstructured text data
Addressing the limited scope problem of focused crawling using a result merging approach
Proceedings of the 2010 ACM Symposium on Applied Computing
Effects of aligned corpus quality and size in corpus-based CLIR
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Creating a Persian-English comparable corpus
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Topic based creation of a persian-english comparable corpus
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Termhood-Based comparability metrics of comparable corpus in special domain
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes of data, offers a natural source for this. We experimented with focused crawling as a means to acquire comparable corpora in the genomics domain. The acquired corpora were used to statistically translate domain-specific words. The same words were also translated using a high-quality, but non-genomics-related parallel corpus, which fared considerably worse. We also evaluated our system with standard information retrieval (IR) experiments, combining statistical translation using the Web corpora with dictionary-based translation. The results showed improvement over pure dictionary-based translation. Therefore, mining the Web for comparable corpora seems promising.