An unsupervised method to improve Spanish stemmer

  • Authors:
  • Antonio Fernández;Josval Díaz;Yoan Gutiérrez;Rafael Muñoz

  • Affiliations:
  • Departamento de Informática, Universidad de Matanzas, Matanzas, Cuba;Departamento de Informática, Universidad de Matanzas, Matanzas, Cuba;Departamento de Informática, Universidad de Matanzas, Matanzas, Cuba;Departamento de Lenguaje y Sistemas Informáticos, Universidad de Alicante, España

  • Venue:
  • NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We evaluate the effectiveness of using our edit distances algorithm to improving an unsupervised language-independent stemming method. The main idea is to create morphological families through the automatic words grouping using our distance. Based on that grouping, we make a stemming process. The capacity of the edit distance algorithm in the task of words clustering and the ability of our method to generate the correct stem for Spanish was evaluated. A good result (98% precision) for the morphological families' creation and also a remarkable 99.85% of correct stemming was obtained.