Automatic word spacing using hidden Markov model for refining Korean text corpora

  • Authors:
  • Do-Gil Lee;Sang-Zoo Lee;Hae-Chang Rim;Heui-Seok Lim

  • Affiliations:
  • Korea University, Seoul, Korea;Korea University, Seoul, Korea;Korea University, Seoul, Korea;Chonan University, CheonAn, Korea

  • Venue:
  • COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a word spacing model using a hidden Markov model (HMM) for refining Korean raw text corpora. Previous statistical approaches for automatic word spacing have used models that make use of inaccurate probabilities because they do not consider the previous spacing state. We consider word spacing problem as a classification problem such as Part-of-Speech (POS) tagging and have experimented with various models considering extended context. Experimental result shows that the performance of the model becomes better as the more context considered. In case of the same number of parameters are used with other method, it is proved that our model is more effective by showing the better results.