Tagging and morphological disambiguation of Turkish text

Authors:
Kemal Oflazer;Ìlker Kuruöz
Affiliations:
Bilkent University, Bilkent, Ankara, Turkey;Bilkent University, Bilkent, Ankara, Turkey
Venue:
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Year:
1994

Citing 7
Cited 13

A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Ambiguity resolution in a reductionistic parser

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Two-level description of Turkish morphology

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Constraint grammar as a framework for parsing running text

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Parsing Turkish using the lexical functional grammar formalism

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Compiling and using finite-state syntactic rules

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1

Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Computational Linguistics
Morphological disambiguation by voting constraints

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Comparing a linguistic and a stochastic tagger

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Combining stochastic and rule-based methods for disambiguation in agglutinative languages

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Statistical morphological disambiguation for agglutinative languages

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Learning morphological disambiguation rules for Turkish

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Part-of-Speech Tagging Using Word Probability Based on Category Patterns

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Does tagging help parsing?: a case study on finite state parsing

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Implementing voting constraints with finite state transducers

FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Verbs are where all the action lies: experiences of shallow parsing of a morphologically rich language

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatic semantic subject indexing of web documents in highly inflected languages

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Pronunciation disambiguation in turkish

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology that is based on a lexicon of about 24,000 root words. This is augmented with a multiword and idiomatic construct recognizer, and most importantly morphological disambiguator based on local neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Preliminary results indicate that the tagger can tag about 98-99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, generates, on the average, 50% less ambiguous parses and parses almost 2.5 times faster. The tagging functionality is not specific to Turkish, and can be applied to any language with a proper morphological analysis interface.