Combined word-spacing method for disambiguating korean texts

Authors:
Mi-young Kang;Aesun Yoon;Hyuk-chul Kwon
Affiliations:
Korean Language Processing Lab., School of Electrical & Computer Engineering, Pusan National University, Busan, Korea;Korean Language Processing Lab., School of Electrical & Computer Engineering, Pusan National University, Busan, Korea;Korean Language Processing Lab., School of Electrical & Computer Engineering, Pusan National University, Busan, Korea
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 5
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A compression-based algorithm for Chinese word segmentation

Computational Linguistics
Segmenting a sentence into morphemes using statistic information between words

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Improving partial parsing based on error-pattern analysis for a Korean grammar-checker

ACM Transactions on Asian Language Information Processing (TALIP)
A hybrid approach to automatic word-spacing in Korean

IEA/AIE'2004 Proceedings of the 17th international conference on Innovations in applied artificial intelligence

Category-pattern-based korean word-spacing

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an automatic word-spacing method for a Korean text preprocessing system in resolving the problem of context-dependent word-spacing The current method combines the stochastic-based method and partial parsing First, the stochastic method splits an input sentence into a candidate-word sequence using word unigrams and syllable bigrams Second, the system engages a partial parsing module based on the asymmetric relation between the candidate-words The partial parsing module manages the governing relationship using words which are incorporated into the knowledge base as having a high-probability of spacing-error words These elements serve as parsing trigger points based on their linguistic information, and they deter-mine the parsing direction as well as the parsing scope Combining the stochastic- and linguistic-based methods, the current automatic word-spacing system becomes robust against the problem of context-dependant word-spacing An average 8.98% amelioration of the total error rate is obtained for inner and external data.