Hidden Markov model-based Korean part-of-speech tagging considering high agglutinativity, word-spacing, and lexical correlativity

Authors:
Sang-Zoo Lee;Jun-ichi Tsujii;Hae-Chang Rim
Affiliations:
University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;Korea University, Seongbuk-Gu, Seoul, Korea
Venue:
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Year:
2000

Citing 1
Cited 4

Building probabilistic models for natural language

Building probabilistic models for natural language

Part-of-speech tagging considering surface form for an agglutinative language

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Part-of-Speech Tagging Using Word Probability Based on Category Patterns

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Dependency Analysis of Clauses Using Parse Tree Kernels

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Factors affecting the accuracy of Korean parsing

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present hidden Markov models for Korean part-of-speech tagging, which consider Korean characteristics such as high agglutinativity, word-spacing, and high lexical correlativity. In order ot consider rich information in contexts, the models adopt a less strict Markov assumption. In the models, sparse-data problem is very serious and their parameters tend to be estimated unreliably because they have a large number of parameters. To overcome sparse-data problem, our model uses a simplified version of the well-known back-off smoothing method. To mitigate unreliable estimation problem, our models assume joint independence instead of conditional independence because joint probabilities have the same degree of estimation reliability. Experimental results show that models with rich contexts perform even better than standard HMMs and that joint independent assumption is effective in some models.