Resources for Turkish morphological processing

  • Authors:
  • Haşim Sak;Tunga Güngör;Murat Saraçlar

  • Affiliations:
  • Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey 34342;Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey 34342;Department of Electrical & Electronic Engineering, Boğğaziçi University, Istanbul, Turkey 34342

  • Venue:
  • Language Resources and Evaluation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a set of language resources and tools--a morphological parser, a morphological disambiguator, and a text corpus--for exploiting Turkish morphology in natural language processing applications. The morphological parser is a state-of-the-art finite-state transducer-based implementation of Turkish morphology. The disambiguator is based on the averaged perceptron algorithm and has the best accuracy reported for Turkish in the literature. The text corpus has been compiled from the web and contains about 500 million tokens. This is the largest Turkish web corpus published.