Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms

  • Authors:
  • Per Ahlgren;Jaana Kekäläinen

  • Affiliations:
  • Swedish School of Library and Information Science, University College of Borås, Borås, Sweden 501 90;Department of Information Studies, 33014 University of Tampere, Tampere, Finland

  • Venue:
  • Information Retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, which treats Swedish full text retrieval, the problem of morphological variation of query terms in the document database is studied. The Swedish CLEF 2003 test collection was used, and the effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Four of the seven tested combinations involved indexing strategies that used normalization, a form of conflation. All of these four combinations employed compound splitting, both during indexing and at query phase. SWETWOL, a morphological analyzer for the Swedish language, was used for normalization and compound splitting. A fifth combination used stemming, while a sixth attempted to group related terms by right hand truncation of query terms. The truncation was performed by a search expert. These six combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. Both the truncation combination, the four combinations based on normalization and the stemming combination outperformed the baseline. Truncation had the best performance. The main conclusion of the paper is that truncation, normalization and stemming enhanced retrieval effectiveness in comparison to the baseline. Further, normalization and stemming were not far below truncation.