Learning morphological disambiguation rules for Turkish

Authors:
Deniz Yuret;Ferhan Türe
Affiliations:
Koç University, Istanbul, Turkey;Koç University, Istanbul, Turkey
Venue:
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Year:
2006

Citing 13
Cited 16

Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Learning Decision Lists

Machine Learning
The CN2 Induction Algorithm

Machine Learning
Rule Induction with CN2: Some Recent Improvements

EWSL '91 Proceedings of the European Working Session on Machine Learning
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Tagging and morphological disambiguation of Turkish text

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Morphological disambiguation by voting constraints

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Combining stochastic and rule-based methods for disambiguation in agglutinative languages

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research

Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Dependency parsing of turkish

Computational Linguistics
Morphological Disambiguation of Turkish Text with Perceptron Algorithm

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Dependency parsing as a classification problem

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Exploring different representational units in English-to-Turkish statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
ITU treebank annotation tool

LAW '07 Proceedings of the Linguistic Annotation Workshop
Comparison of similarity measures for clustering Turkish documents

Intelligent Data Analysis
Modeling morphologically rich languages using split words and unstructured dependencies

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Statistical machine translation into a morphologically complex language

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation

IEEE Transactions on Audio, Speech, and Language Processing
Resources for Turkish morphological processing

Language Resources and Evaluation
Resources for Turkish morphological processing

Language Resources and Evaluation
Morphological annotation of a corpus with a collaborative multiplayer game

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Information retrieval from turkish radiology reports without medical knowledge

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Morpheme segmentation in the METU-Sabancı Turkish treebank

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a rule based model for morphological disambiguation of Turkish. The rules are generated by a novel decision list learning algorithm using supervised training. Morphological ambiguity (e.g. lives = live+s or life+s) is a challenging problem for agglutinative languages like Turkish where close to half of the words in running text are morphologically ambiguous. Furthermore, it is possible for a word to take an unlimited number of suffixes, therefore the number of possible morphological tags is unlimited. We attempted to cope with these problems by training a separate model for each of the 126 morphological features recognized by the morphological analyzer. The resulting decision lists independently vote on each of the potential parses of a word and the final parse is selected based on our confidence on these votes. The accuracy of our model (96%) is slightly above the best previously reported results which use statistical models. For comparison, when we train a single decision list on full tags instead of using separate models on each feature we get 91% accuracy.