Does dictionary based bilingual retrieval work in a non-normalized index?

Authors:
Eija Airio;Kimmo Kettunen
Affiliations:
University of Tampere, Department of Information Studies, Kanslerinrinne 1, FIN-33014, Finland;University of Tampere, Department of Information Studies, Kanslerinrinne 1, FIN-33014, Finland
Venue:
Information Processing and Management: an International Journal
Year:
2009

Citing 14
Cited 0

Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Finding approximate matches in large lexicons

Software—Practice & Experience
The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Utaclir @ CLEF 2001 - Effects of Compound Splitting and N-Gram Techniques

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Monolingual Document Retrieval for European Languages

Information Retrieval
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
How Effective is Stemming and Decompounding for German Text Retrieval?

Information Retrieval
Technical issues of cross-language information retrieval: a review

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Word normalization and decompounding in mono- and bilingual IR

Information Retrieval
Developing an automatic linguistic truncation operator for best-match retrieval of Finnish in inflected word form text database indexes

Journal of Information Science
Restricted inflectional form generation in management of morphological keyword variation

Information Retrieval
N-grams and morphological normalization in text classification: a comparison on a Croatian-English parallel corpus

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Is a morphologically complex language really that complex in full-text retrieval?

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
A first approach to CLIR using character n-grams alignment

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.