Combined word-spacing method for disambiguating korean texts

  • Authors:
  • Mi-young Kang;Aesun Yoon;Hyuk-chul Kwon

  • Affiliations:
  • Korean Language Processing Lab., School of Electrical & Computer Engineering, Pusan National University, Busan, Korea;Korean Language Processing Lab., School of Electrical & Computer Engineering, Pusan National University, Busan, Korea;Korean Language Processing Lab., School of Electrical & Computer Engineering, Pusan National University, Busan, Korea

  • Venue:
  • AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an automatic word-spacing method for a Korean text preprocessing system in resolving the problem of context-dependent word-spacing The current method combines the stochastic-based method and partial parsing First, the stochastic method splits an input sentence into a candidate-word sequence using word unigrams and syllable bigrams Second, the system engages a partial parsing module based on the asymmetric relation between the candidate-words The partial parsing module manages the governing relationship using words which are incorporated into the knowledge base as having a high-probability of spacing-error words These elements serve as parsing trigger points based on their linguistic information, and they deter-mine the parsing direction as well as the parsing scope Combining the stochastic- and linguistic-based methods, the current automatic word-spacing system becomes robust against the problem of context-dependant word-spacing An average 8.98% amelioration of the total error rate is obtained for inner and external data.