Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean

Authors:
Gary Geunbae Lee;Jong-Hyeok Lee;Jeongwon Cha
Affiliations:
Pohang University of Science and Technology (POSTECH), Pohang, 790-784, Korea;Pohang University of Science and Technology (POSTECH), Pohang, 790-784, Korea;Pohang University of Science and Technology (POSTECH), Pohang, 790-784, Korea
Venue:
Computational Linguistics
Year:
2002

Citing 8
Cited 6

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
Tagging accurately: don't guess if you know

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A syntax-based part-of-speech analyser

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Unsupervised learning of word-category guessing rules

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

Exploring term dependences in probabilistic information retrieval model

Information Processing and Management: an International Journal
Automatic acquisition of named entity tagged corpus from world wide web

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
A cross-lingual annotation projection approach for relation detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improving Korean verb-verb morphological disambiguation using lexical knowledge from unambiguous unlabeled data and selective web counts

Pattern Recognition Letters
Using wiktionary to improve lexical disambiguation in multiple languages

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most errors in Korean morphological analysis and part-of-speech (POS) tagging are caused by unknown morphemes. This paper presents a syllable-pattern-based generalized unknown-morpheme-estimation method with POSTAG (POStech TAGger), which is a statistical and rule-based hybrid POS tagging system. This method of guessing unknown morphemes is based on a combination of a morpheme pattern dictionary that encodes general lexical patterns of Korean morphemes with a posteriori syllable trigram estimation. The syllable trigrams help to calculate lexical probabilities of the unknown morphemes and are utilized to search for the best tagging result. This method can guess the POS tags of unknown morphemes regardless of their numbers and/or positions in an eojeol (a Korean spacing unit similar to an English word), which is not possible with other systems for tagging Korean. In a series of experiments using three different domain corpora, the system achieved a 97% tagging accuracy even though 10% of the morphemes in the test corpora were unknown. It also achieved very high coverage and accuracy of estimation for all classes of unknown morphemes.