A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Ambiguity resolution in a reductionistic parser
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Two-level description of Turkish morphology
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Constraint grammar as a framework for parsing running text
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Parsing Turkish using the lexical functional grammar formalism
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Compiling and using finite-state syntactic rules
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Morphological disambiguation by voting constraints
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Comparing a linguistic and a stochastic tagger
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Combining stochastic and rule-based methods for disambiguation in agglutinative languages
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Statistical morphological disambiguation for agglutinative languages
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Learning morphological disambiguation rules for Turkish
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Part-of-Speech Tagging Using Word Probability Based on Category Patterns
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Does tagging help parsing?: a case study on finite state parsing
FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
Implementing voting constraints with finite state transducers
FSMNLP '09 Proceedings of the International Workshop on Finite State Methods in Natural Language Processing
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatic semantic subject indexing of web documents in highly inflected languages
ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Pronunciation disambiguation in turkish
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Hi-index | 0.00 |
Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology that is based on a lexicon of about 24,000 root words. This is augmented with a multiword and idiomatic construct recognizer, and most importantly morphological disambiguator based on local neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Preliminary results indicate that the tagger can tag about 98-99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, generates, on the average, 50% less ambiguous parses and parses almost 2.5 times faster. The tagging functionality is not specific to Turkish, and can be applied to any language with a proper morphological analysis interface.