Chinese unknown word identification using character-based tagging and chunking

  • Authors:
  • Goh Chooi Ling;Masayuki Asahara;Yuji Matsumoto

  • Affiliations:
  • Nara Institute of Science and Technology;Nara Institute of Science and Technology;Nara Institute of Science and Technology

  • Venue:
  • ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.