Statistical vs. rule-based stemming for monolingual french retrieval

  • Authors:
  • Prasenjit Majumder;Mandar Mitra;Kalyankumar Datta

  • Affiliations:
  • Indian Statistical Institute, Kolkata;Indian Statistical Institute, Kolkata;Dept. of EE, Jadavpur University, Kolkata

  • Venue:
  • CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes our approach to the 2006 Adhoc Monolingual Information Retrieval run for French. The goal of our experiment was to compare the performance of a proposed statistical stemmer with that of a rule-based stemmer, specifically the French version of Porter's stemmer. The statistical stemming approach is based on lexicon clustering, using a novel string distance measure. We submitted three official runs, besides a baseline run that uses no stemming. The results show that stemming significantly improves retrieval performance (as expected) by about 9-10%, and the performance of the statistical stemmer is comparable with that of the rule-based stemmer.