Information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Monolingual Document Retrieval for European Languages
Information Retrieval
Finite state morphology and information retrieval
Natural Language Engineering
An analysis of web searching by European AlltheWeb.com users
Information Processing and Management: an International Journal
How do search engines respond to some non-English queries?
Journal of Information Science
Restricted inflectional form generation in management of morphological keyword variation
Information Retrieval
Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Is a morphologically complex language really that complex in full-text retrieval?
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Hi-index | 0.00 |
This paper discusses information retrieval of Finnish and keyword variation management by generating inflected variant keyword forms. Finnish is a highly inflectional language, and thus keyword variation management of queries and query indexes is of utter importance for successful Finnish full-text retrieval. In the paper we show that generation of a quite small number of variant keyword forms leads to good retrieval performance using a probabilistic best-match retrieval system (Lemur). Generation of almost the full paradigm of inflected nominal forms improves the results slightly. We have also interesting results with regards to different index types: our evaluation shows that generated inflected queries behave extremely well in a lemmatized index, which is supposedly not suitable for this query type. We also show that in a research environment even inexact generation that produces lots of incorrect inflected forms achieves high precision-recall performance without considerable loss in query throughput effectiveness. We use two different word form generators and their variants and compare the results to commonly used reductive word form variation management methods, stemming and lemmatization. The paper includes also a short discussion about usage of the variant keyword method with Web search engines.