Towards the automatic lemmatization of 16th century mexican spanish: a stemming scheme for the CHEM

  • Authors:
  • Alfonso Medina-Urrea

  • Affiliations:
  • GIL, Instituto de Ingeniería, UNAM, Coyoacán, DF, Mexico

  • Venue:
  • CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two of the problems that should arise when developing a stemming scheme for diachronic corpora are: (1) morphological systems of natural languages may vary throughout time, and these changes are normally not documented sufficiently; and (2) they exhibit very diverse orthographic characteristics. In this short paper, a stemming strategy for a diachronic corpus of Mexican Spanish is briefly described, which partially faces up to these problems. Success rates of the method are contrasted to those of a Porter stemmer.