A Word Stemming Algorithm for the Spanish Language

Authors:
A. Honrado;R. Leon;R. O'Donnel;D. Sinclair
Affiliations:
-;-;-;-
Venue:
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Year:
2000

Citing 0
Cited 3

Stemming Galician Texts

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Machines in the conversation: detecting themes and trends in informal communication streams

IBM Systems Journal
Diachronic stemmed corpus and dictionary of Galician language

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper describes a word stemming algorithm for the Spanish language. Experiments in document retrieval regarding English text suggest that word stemming based on morphological analysis does not generally or consistently outperform ad-hoc hand tuned algorithms such as that proposed by M. Porter (1980). It is difficult to produce a Porter style algorithm for a romantic language such as Spanish, however due to the greater grammatical complexity and due to the fact that inflection often causes changes to the root of words, not just to their endings (as is mostly the case with English). In general terms, the difficulty consists of producing an algorithm which can cope with the additional complexity of Spanish morphology whilst preserving the simplicity of a Porter style algorithm. One such algorithm is presented. The algorithm combines dictionary look-ups with some 300 stemming and intermediate reduction rules.