A Word Stemming Algorithm for the Spanish Language

  • Authors:
  • A. Honrado;R. Leon;R. O'Donnel;D. Sinclair

  • Affiliations:
  • -;-;-;-

  • Venue:
  • SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper describes a word stemming algorithm for the Spanish language. Experiments in document retrieval regarding English text suggest that word stemming based on morphological analysis does not generally or consistently outperform ad-hoc hand tuned algorithms such as that proposed by M. Porter (1980). It is difficult to produce a Porter style algorithm for a romantic language such as Spanish, however due to the greater grammatical complexity and due to the fact that inflection often causes changes to the root of words, not just to their endings (as is mostly the case with English). In general terms, the difficulty consists of producing an algorithm which can cope with the additional complexity of Spanish morphology whilst preserving the simplicity of a Porter style algorithm. One such algorithm is presented. The algorithm combines dictionary look-ups with some 300 stemming and intermediate reduction rules.