Generating variant keyword forms for a morphologically complex language leads to successful information retrieval with finnish

  • Authors:
  • Kimmo Kettunen;Paavo Arvola

  • Affiliations:
  • School of Information Sciences, University of Tampere, Finland;School of Information Sciences, University of Tampere, Finland

  • Venue:
  • IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses information retrieval of Finnish and keyword variation management by generating inflected variant keyword forms. Finnish is a highly inflectional language, and thus keyword variation management of queries and query indexes is of utter importance for successful Finnish full-text retrieval. In the paper we show that generation of a quite small number of variant keyword forms leads to good retrieval performance using a probabilistic best-match retrieval system (Lemur). Generation of almost the full paradigm of inflected nominal forms improves the results slightly. We have also interesting results with regards to different index types: our evaluation shows that generated inflected queries behave extremely well in a lemmatized index, which is supposedly not suitable for this query type. We also show that in a research environment even inexact generation that produces lots of incorrect inflected forms achieves high precision-recall performance without considerable loss in query throughput effectiveness. We use two different word form generators and their variants and compare the results to commonly used reductive word form variation management methods, stemming and lemmatization. The paper includes also a short discussion about usage of the variant keyword method with Web search engines.