A novel method for stemmer generation based on hidden markov models

  • Authors:
  • Massimo Melucci;Nicola Orio

  • Affiliations:
  • University of Padova, Padova, Italy;University of Padova, Padova, Italy

  • Venue:
  • CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a method based on Hidden Markov Models (HMMs) to generate statistical stemmers. Using a list of words as training set, the method estimates the HMM parameters which are used to calculate the most probable stem for an arbitrary word. Stemming is performed by computing the most probable path, through the HMM states, corresponding to the input word. Linguistic knowledge or a training set of manually stemmed words are not required. We describe the method and the results of the experiments carried out using standard test collections for five different languages.