Combining segmenter and chunker for Chinese word segmentation

  • Authors:
  • Masayuki Asahara;Chooi Ling Goh;Xiaojie Wang;Yuji Matsumoto

  • Affiliations:
  • Nara Institute of Science and Technology, Japan;Nara Institute of Science and Technology, Japan;Nara Institute of Science and Technology, Japan;Nara Institute of Science and Technology, Japan

  • Venue:
  • SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Our proposed method is to use a Hidden Markov Model-based word segmenter and a Support Vector Machine-based chunker for Chinese word segmentation. Firstly, input sentences are analyzed by the Hidden Markov Model-based word segmenter. The word segmenter produces n-best word candidates together with some class information and confidence measures. Secondly, the extracted words are broken into character units and each character is annotated with the possible word class and the position in the word, which are then used as the features for the chunker. Finally, the Support Vector Machine-based chunker brings character units together into words so as to determine the word boundaries.