Experiments in multilingual information retrieval using the SPIDER system
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval effectiveness of proper name search methods
Information Processing and Management: an International Journal
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Structured translation for cross-language information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
UTACLIR -: general query translation framework for several language pairs
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical transliteration for english-arabic cross language information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Embedding web-based statistical translation models in cross-language information retrieval
Computational Linguistics - Special issue on web as corpus
Translating unknown queries with web corpora for cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Report on the TREC 2004 genomics track
ACM SIGIR Forum
Translating cross-lingual spelling variants using transformation rules
Information Processing and Management: an International Journal
FITE-TRT: a high quality translation technique for OOV words
Proceedings of the 2006 ACM symposium on Applied computing
s-grams: Defining generalized n-grams for information retrieval
Information Processing and Management: an International Journal
Focused web crawling in the acquisition of comparable corpora
Information Retrieval
Effects of aligned corpus quality and size in corpus-based CLIR
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Managing misspelled queries in IR applications
Information Processing and Management: an International Journal
Hi-index | 0.00 |
In cross-language information retrieval (CLIR), novel or non-standard expressions, technical terminology, or rare proper nouns can be seen as noise when they appear in queries or in the target collection. This kind of vocabulary is often out-of-vocabulary (OOV) for dictionaries that are used to translate queries. In historic document retrieval (HDR), OCR errors and historical spelling variants cause similar problems. In this paper, three data driven approaches to these problems are presented. The two first methods, the transformation rule based translation (TRT) method and the classified s-gram method, operate on string level. With them approximate matches of a query word can be recognized from the target document collection and included into the target query. In the third method, the corpus-based approach, parallel or comparable corpora are employed to derive translation knowledge that can be used to translate OOV words. Besides the overview of the methods, three case studies highlighting their practical applications in CLIR are also presented. The methods are shown to be effective in query translation without dictionaries between closely related languages (TRT and s-grams), OOV word translation (s-grams), and boosting dictionary-based CLIR performance by way of OOV word translation (corpus based methods).