A methodology for building simple but robust stemmers without language knowledge: overview, data model and ranking algorithm

  • Authors:
  • Nikitas N. Karanikolas

  • Affiliations:
  • Technological Educational Institute (TEI) of Athens

  • Venue:
  • Proceedings of the 14th International Conference on Computer Systems and Technologies
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The purpose of this work is to define a methodology for building simple but robust stemmers, without having knowledge of the stemmer's target language. The target stemmer is based on conditional suffix replacement (actually suffix removal) in one or more steps. The building process (that refines the stemmer) uses the arguments of experts against the results of a primary stemmer. Even the experts did not need be speakers of the target language. They have available the original words, their translations (in their native language) and the results (stems) produced by the primary stemmer. The language resources are only a list of suffixes (used in the target language) and the translations of the terms existing in a corpus of texts from the target language.