Word identification for Mandarin Chinese sentences

  • Authors:
  • Keh-Jiann Chen;Shing-Huan Liu

  • Affiliations:
  • Institute of Information Science, Academia Sinica;Institute of Information Science, Academia Sinica

  • Venue:
  • COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The difficulties of identifying words include (1) the identification of complex words, such as Determinative-Measure, reduplications, derived words etc., (2) the identification of proper names, (3) resolving the ambiguous segmentations. In this paper, we propose the possible solutions for the above difficulties. We adopt a matching algorithm with 6 different heuristic rules to resolve the ambiguities and achieve an 99.77% of the success rate. The statistical data supports that the maximal matching algorithm is the most effective heuristics.