Part-of-Speech Tagging Using Word Probability Based on Category Patterns

Authors:
Mi-Young Kang;Sung-Won Jung;Kyung-Soon Park;Hyuk-Chul Kwon
Affiliations:
Pusan National University, Korean Language Processing Laboratory, Department of Computer Science Engineering,;Pusan National University, Korean Language Processing Laboratory, Department of Computer Science Engineering, and Pusan National University, Center for U-Port IT Research and Education,;Nara Info Tech co., ltd, Jangjeon-dong, Geumjeong-gu, 609-735, Busan, Korea;Pusan National University, Korean Language Processing Laboratory, Department of Computer Science Engineering, and Pusan National University, Center for U-Port IT Research and Education,
Venue:
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 7
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Tagging and morphological disambiguation of Turkish text

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Statistical morphological disambiguation for agglutinative languages

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Hidden Markov model-based Korean part-of-speech tagging considering high agglutinativity, word-spacing, and lexical correlativity

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on part-of-speech (POS, category) tagging based on word probability estimated using morpheme unigrams and category patterns within a word. The word-N-gram-based POS-tagging model is difficult to adapt to agglutinative languages such as Korean, Turkish and Hungarian, among others, due to the high productivity of words. Thus, many of the stochastic studies on Korean POS-tagging have been conducted based on morpheme N-grams. However, the morpheme-N-gram model also has difficulty coping with data sparseness when augmenting contextual information in order to assure sufficient performance. In addition, the model has difficulty conceiving the relationship of morphemes within a word. The present POS-tagging algorithm (a) resolves the data-sparseness problem thanks to a morpheme-unigram-based approach and (b) involves the relationship of morphemes within a word by estimating the weight of the category of a morpheme in a category pattern constituting a word. With the proposed model, a performance similar to that with other models that use more than just the morpheme-unigram model was observed.