Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Quantifying the utility of parallel corpora
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing cross-language query expansion techniques by degrading translation resources
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
UTACLIR -: general query translation framework for several language pairs
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Empirical studies on the impact of lexical resources on CLIR performance
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Report on the TREC 2004 genomics track
ACM SIGIR Forum
Creating and exploiting a comparable corpus in cross-language information retrieval
ACM Transactions on Information Systems (TOIS)
The effect of translation quality in MT-based cross-language information retrieval
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Focused web crawling in the acquisition of comparable corpora
Information Retrieval
Evaluation of Multilingual and Multi-modal Information Retrieval
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments
Proceedings of the second workshop on Analytics for noisy unstructured text data
Hi-index | 0.00 |
Aligned corpora are often-used resources in CLIR systems. The three qualities of translation corpora that most dramatically affect the performance of a corpus-based CLIR system are: (1) topical nearness to the translated queries, (2) the quality of the alignments, and (3) the size of the corpus. In this paper, the effects of these factors are studied and evaluated. Topics of two different domains (news and genomics) are translated with corpora of varying alignment quality, ranging from a clean parallel corpus to noisier comparable corpora. Also, the sizes of the corpora are varied. The results show that of the three qualities, topical nearness is the most crucial factor, outweighing both other factors. This indicates that noisy comparable corpora should be used as complimentary resources, when parallel corpora are not available for the domain in question.