A methodology for building simple but robust stemmers without language knowledge: overview, data model and ranking algorithm

Authors:
Nikitas N. Karanikolas
Affiliations:
Technological Educational Institute (TEI) of Athens
Venue:
Proceedings of the 14th International Conference on Computer Systems and Technologies
Year:
2013

Citing 3
Cited 0

Automatic Language-Specific Stemming in Information Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Bootstrapping the Albanian Information Retrieval

BCI '09 Proceedings of the 2009 Fourth Balkan Conference in Informatics
Poor man’s stemming: unsupervised recognition of same-stem words

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of this work is to define a methodology for building simple but robust stemmers, without having knowledge of the stemmer's target language. The target stemmer is based on conditional suffix replacement (actually suffix removal) in one or more steps. The building process (that refines the stemmer) uses the arguments of experts against the results of a primary stemmer. Even the experts did not need be speakers of the target language. They have available the original words, their translations (in their native language) and the results (stems) produced by the primary stemmer. The language resources are only a list of suffixes (used in the target language) and the translations of the terms existing in a corpus of texts from the target language.