Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study

  • Authors:
  • Chunyu Kit;Haihua Pan;Hongbiao Chen

  • Affiliations:
  • City University of Hong Kong;City University of Hong Kong;City University of Hong Kong

  • Venue:
  • SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Just like other NLP applications, a serious problem with Chinese word segmentation lies in the ambiguities involved. Disambiguation methods fall into different categories, e.g., rule-based, statistical-based and example-based approaches, each of which may involve a variety of machine learning techniques. In this paper we report our current progress within the example-based approach, including its framework, example representation and collection, example matching and application. Experimental results show that this effective approach resolves more than 90% of ambiguities found. Hence, if it is integrated effectively with a segmentation method of the precision P 95%, the resulting segmentation accuracy can reach, theoretically, beyond 99.5%.