Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew
Computational Linguistics
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Machine Learning
Machine Learning
Rule Induction with CN2: Some Recent Improvements
EWSL '91 Proceedings of the European Working Session on Machine Learning
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Tagging and morphological disambiguation of Turkish text
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Morphological disambiguation by voting constraints
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Combining stochastic and rule-based methods for disambiguation in agglutinative languages
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
OPUS: an efficient admissible algorithm for unordered search
Journal of Artificial Intelligence Research
Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Computational Linguistics
Morphological Disambiguation of Turkish Text with Perceptron Algorithm
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Dependency parsing as a classification problem
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Exploring different representational units in English-to-Turkish statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
LAW '07 Proceedings of the Linguistic Annotation Workshop
Comparison of similarity measures for clustering Turkish documents
Intelligent Data Analysis
Modeling morphologically rich languages using split words and unstructured dependencies
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Statistical machine translation into a morphologically complex language
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
IEEE Transactions on Audio, Speech, and Language Processing
Resources for Turkish morphological processing
Language Resources and Evaluation
Resources for Turkish morphological processing
Language Resources and Evaluation
Morphological annotation of a corpus with a collaborative multiplayer game
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Information retrieval from turkish radiology reports without medical knowledge
FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Morpheme segmentation in the METU-Sabancı Turkish treebank
LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Hi-index | 0.00 |
In this paper, we present a rule based model for morphological disambiguation of Turkish. The rules are generated by a novel decision list learning algorithm using supervised training. Morphological ambiguity (e.g. lives = live+s or life+s) is a challenging problem for agglutinative languages like Turkish where close to half of the words in running text are morphologically ambiguous. Furthermore, it is possible for a word to take an unlimited number of suffixes, therefore the number of possible morphological tags is unlimited. We attempted to cope with these problems by training a separate model for each of the 126 morphological features recognized by the morphological analyzer. The resulting decision lists independently vote on each of the potential parses of a word and the final parse is selected based on our confidence on these votes. The accuracy of our model (96%) is slightly above the best previously reported results which use statistical models. For comparison, when we train a single decision list on full tags instead of using separate models on each feature we get 91% accuracy.