Robust ending guessing rules with application to Slavonic languages

  • Authors:
  • Preslav Nakov;Elena Paskaleva

  • Affiliations:
  • University of California, Berkeley, CA;Bulgarian Academy of Sciences, Sofia, Bulgaria

  • Venue:
  • ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper studies the automatic extraction of diagnostic word endings for Slavonic languages aimed to determine some grammatical, morphological and semantic properties of the underlying word. In particular, ending guessing rules are being learned from a large morphological dictionary of Bulgarian in order to predict POS, gender, number, article and semantics. A simple exact high accuracy algorithm is developed and compared to an approximate one, which uses a scoring function previously proposed by Mikheev for POS guessing. It is shown how the number of rules of the latter can be reduced by a factor of up to 35, without sacrificing performance. The evaluation demonstrates coverage close to 100%, and precision of 97--99% for the approximate algorithm.