Modeling morphologically rich languages using split words and unstructured dependencies

  • Authors:
  • Deniz Yuret;Ergun Biçici

  • Affiliations:
  • Koç University, Sariyer, Istanbul, Turkey;Koç University, Sariyer, Istanbul, Turkey

  • Venue:
  • ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n -- 1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n -- 1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model.