Foundations of statistical natural language processing
Foundations of statistical natural language processing
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Modern Information Retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Monolingual Document Retrieval for European Languages
Information Retrieval
How Effective is Stemming and Decompounding for German Text Retrieval?
Information Retrieval
Using register-diversified corpora for general language studies
Computational Linguistics - Special issue on using large corpora: II
Finite state morphology and information retrieval
Natural Language Engineering
TIPSTER '93 Proceedings of a workshop on held at Fredericksburg, Virginia: September 19-23, 1993
Word normalization and decompounding in mono- and bilingual IR
Information Retrieval
Management of keyword variation with frequency based generation of word forms in IR
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Restricted inflectional form generation in management of morphological keyword variation
Information Retrieval
Searching strategies for the Hungarian language
Information Processing and Management: an International Journal
Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Does dictionary based bilingual retrieval work in a non-normalized index?
Information Processing and Management: an International Journal
Indexing and stemming approaches for the Czech language
Information Processing and Management: an International Journal
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages
ACM Transactions on Asian Language Information Processing (TALIP)
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Hi-index | 0.00 |
In this paper we show that keyword variation of a morphologically complex language, Finnish, can be handled effectively for IR purposes by generating only the textually most frequent forms of the keyword. Theoretically Finnish nouns have about 2,000 different forms, but occurrences of most of the forms are rare. Corpus statistics showed that about 84 – 88 per cent of the occurrences of inflected noun forms are forms of only six cases out of the 14 possible. This number – maximally 2*6 – of keyword’s variant forms makes it feasible to try them all in a search. IR results of the frequent keyword form variation coverage were tested with three to twelve keyword variant forms in two test collections, TUTK and CLEF 2003’s Finnish material. The results show that the frequent keyword form generation method competes well with the gold standard, lemmatization, with nine and twelve variant keyword forms.