Automatic word spacing using hidden Markov model for refining Korean text corpora

Authors:
Do-Gil Lee;Sang-Zoo Lee;Hae-Chang Rim;Heui-Seok Lim
Affiliations:
Korea University, Seoul, Korea;Korea University, Seoul, Korea;Korea University, Seoul, Korea;Chonan University, CheonAn, Korea
Venue:
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Year:
2002

Citing 1
Cited 6

Tagging English text with a probabilistic model

Computational Linguistics

A syllable based word recognition model for Korean noun extraction

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Automatic word spacing in Korean for small memory devices

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Self-organizing η-gram model for automatic word spacing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Combining rule-based learning and memory-based learning for automatic word spacing in simple message service

Applied Soft Computing
Automatic word spacing of erroneous sentences in mobile devices with limited hardware resources

Information Processing and Management: an International Journal
A language independent n-gram model for word segmentation

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a word spacing model using a hidden Markov model (HMM) for refining Korean raw text corpora. Previous statistical approaches for automatic word spacing have used models that make use of inaccurate probabilities because they do not consider the previous spacing state. We consider word spacing problem as a classification problem such as Part-of-Speech (POS) tagging and have experimented with various models considering extended context. Experimental result shows that the performance of the model becomes better as the more context considered. In case of the same number of parameters are used with other method, it is proved that our model is more effective by showing the better results.