An example-based study on chinese word segmentation using critical fragments

  • Authors:
  • Qinan Hu;Haihua Pan;Chunyu Kit

  • Affiliations:
  • Department of Chinese, Translation and Linguistics, City University of Hong Kong, Hong Kong;Department of Chinese, Translation and Linguistics, City University of Hong Kong, Hong Kong;Department of Chinese, Translation and Linguistics, City University of Hong Kong, Hong Kong

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

In our study, sentences are represented as sequences of critical fragments, and critical fragments with more than one distinct resolution found in the training corpus are considered as being ambiguous. Different from other studies, the ambiguous critical fragments are disambiguated using an example-based system in our study. The contexts, i.e. the adjacent characters, words and critical fragments, on either side of an ambiguous critical fragment, are used to measure the distance between training and testing examples. Two kinds of measures, overlap metric and chi-squared feature weighting, are employed, and our system achieves a precision of 93.65% and a recall of 96.56% in the open test.