Morphological Disambiguation of Turkish Text with Perceptron Algorithm

  • Authors:
  • Haşim Sak;Tunga Güngör;Murat Saraçlar

  • Affiliations:
  • Dept. of Computer Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey;Dept. of Computer Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey;Dept. of Electrical and Electronic Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey

  • Venue:
  • CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the application of the perceptron algorithm to the morphological disambiguation of Turkish text. Turkish has a productive derivational morphology. Due to the ambiguity caused by complex morphology, a word may have multiple morphological parses, each with a different stem or sequence of morphemes. The methodology employed is based on ranking with perceptron algorithm which has been successful in some NLP tasks in English. We use a baseline statistical trigram-based model of a previous work to enumerate an n-best list of candidate morphological parse sequences for each sentence. We then apply the perceptron algorithm to rerank the n-best list using a set of 23 features. The perceptron trained to do morphological disambiguation improves the accuracy of the baseline model from 93.61% to 96.80%. When we train the perceptron as a POS tagger, the accuracy is 98.27%. Turkish morphological disambiguation and POS tagging results that we obtained is the best reported so far.