STEMBR: a stemming algorithm for the Brazilian Portuguese language

  • Authors:
  • Reinaldo Viana Alvares;Ana Cristina Bicharra Garcia;Inhaúma Ferraz

  • Affiliations:
  • Instituto de Computação, UFF – Universidade Federal Fluminense, São Domingos, Niterói, RJ;Instituto de Computação, UFF – Universidade Federal Fluminense, São Domingos, Niterói, RJ;Instituto de Computação, UFF – Universidade Federal Fluminense, São Domingos, Niterói, RJ

  • Venue:
  • EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stemming algorithms have traditionally been utilized in information retrieval systems as they generate a more concise word representation. However, the efficiency of these algorithms varies according to the language they are used with. This paper presents STEMBR, a stemmer for Brazilian Portuguese whereby the suffix treatment is based on a statistical study of the frequency of the last letter for words found in Brazilian web pages. The proposed stemmer is compared with another algorithm specifically developed for Portuguese. The results show the efficiency of our stemmer.