A statistical approach to machine translation
Computational Linguistics
A network approach to probabilistic information retrieval
ACM Transactions on Information Systems (TOIS)
Using n-grams for Korean text retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Improving cross language retrieval with triangulated translation
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-Language Information Retrieval
Cross-Language Information Retrieval
Fusion Via a Linear Combination of Scores
Information Retrieval
Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report
Information Retrieval
Exploiting query logs for cross-lingual query suggestions
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We report on Korean monolingual, Chinese-Korean English-as-pivot bilingual, and Chinese-English bilingual CLIR experiments using MT software augmented with Web-based entity-oriented translation as resources in the NTCIR-4 environment. Simple stemming is helpful in improving bigram indexing for Korean retrieval. For word indexing, keeping nouns only is preferable. Web-based translation reduces untranslated terms left over after MT and substantially improves CLIR results. Translation concatenation is found to consistently improve CLIR effectiveness, while combining a retrieval list from bigram and word indexing is also helpful. A method to disambiguate multiple MT outputs using a log likelihood ratio threshold was tested. Depending on the nature of the title or description queries, bigram only or a retrieval combination, or relaxed or rigid evaluations, direct bilingual CLIR returned an average precision of 71--79% (English-Korean) and 76--84% (Chinese-English) of the corresponding Korean-Korean and English-English monolingual results. Using English as a pivot in Chinese-Korean CLIR provides about 55--65% the effectiveness that Korean alone does. Entity/terminology translation at the pivot language stage accounts for a large portion of this deficiency. A topic with comparatively worse Chinese-English bilingual result does not necessarily mean that it will continue to under-perform (after further transitive Korean translation) at the Korean retrieval level.