Statistical morphological disambiguation for agglutinative languages

Authors:
Dilek Z. Hakkani-Tür;Kemal Oflazer;Gökhan Tür
Affiliations:
Bilkent University, Ankara, Turkey;Bilkent University, Ankara, Turkey;Bilkent University, Ankara, Turkey
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 11
Cited 15

Morphological parsing and the lexicon

Lexical representation and process
Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Tagging and morphological disambiguation of Turkish text

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Morphological disambiguation by voting constraints

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Combining stochastic and rule-based methods for disambiguation in agglutinative languages

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Dependency parsing with an extended finite state approach

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

A statistical information extraction system for Turkish

Natural Language Engineering
How do search engines respond to some non-English queries?

Journal of Information Science
Serial combination of rules and statistics: a case study in Czech tagging

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A unified language model for large vocabulary continuous speech recognition of Turkish

Signal Processing - Fractional calculus applications in signals and systems
Automatic learning of language model structure

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Context-based morphological disambiguation with random fields

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Part-of-speech tagging of modern hebrew text

Natural Language Engineering
Part-of-Speech Tagging Using Word Probability Based on Category Patterns

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
HunPos: an open source trigram tagger

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Multilingual noise-robust supervised morphological analysis using the WordFrame model

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Web-based frequency dictionaries for medium density languages

WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
A discriminative model for joint morphological disambiguation and dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Morphological annotation of a corpus with a collaborative multiplayer game

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Verb analysis in a highly inflective language with an MFF algorithm

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present statistical models for morphological disambiguation in Turkish. Turkish presents an interesting problem for statistical models since the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflection groups in a trigram model. Among the three models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.