Word segmentation standard in Chinese, Japanese and Korean

  • Authors:
  • Key-Sun Choi;Hitoshi Isahara;Kyoko Kanzaki;Hansaem Kim;Seok Mun Pak;Maosong Sun

  • Affiliations:
  • KAIST, Daejeon Korea;NICT, Kyoto Japan;NICT, Kyoto Japan;National Inst., Korean Lang. Seoul Korea;Baekseok Univ., Cheonan Korea;Tsinghua Univ., Beijing China

  • Venue:
  • ALR7 Proceedings of the 7th Workshop on Asian Language Resources
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word segmentation is a process to divide a sentence into meaningful units called "word unit" [ISO/DIS 24614-1]. What is a word unit is judged by principles for its internal integrity and external use constraints. A word unit's internal structure is bound by principles of lexical integrity, unpredictability and so on in order to represent one syntactically meaningful unit. Principles for external use include language economy and frequency such that word units could be registered in a lexicon or any other storage for practical reduction of processing complexity for the further syntactic processing after word segmentation. Such principles for word segmentation are applied for Chinese, Japanese and Korean, and impacts of the standard are discussed.