A parsing method for identifying words in mandarin Chinese sentences

  • Authors:
  • Liang-Jyh Wang;Tzusheng Pei;Wei-Chuan Li;Lih-Ching R. Huang

  • Affiliations:
  • Application Software Department, Computer and Communication Research Laboratories, Industrial Technology Research Institute, Chutung, Hsinchu, Taiwan, R.O.C.;Advanced Technology Center, CCL, ITRI, Chutung, Hsinchu, Taiwan, R.O.C.;Application Software Department, Computer and Communication Research Laboratories, Industrial Technology Research Institute, Chutung, Hsinchu, Taiwan, R.O.C.;Application Software Department, Computer and Communication Research Laboratories, Industrial Technology Research Institute, Chutung, Hsinchu, Taiwan, R.O.C.

  • Venue:
  • IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1991

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a parsing method for identifying words in mandarin Chinese sentences. The identification system is composed of a Tomita's parser augmented with tests originally a part of the English-Chinese machine translation system CCL-ECMT together with the associated augmented context-free grammar for word composition. The simple augmented grammar with the score function effectively captures the intuitive idea of longest possible composition of Chinese words in sentences and, at the same time, take into consideration the frequency counts of words. The identification rate of this system for the corpora taken from books and a newspaper is 99.6%. This identification system is simple, but the identification rate is relatively high. The minimum element for word-composition parsing is down to characters as opposed to sentence parsing down to Chinese words. It has the potential of incorporating phrase structures and semantic checking into the system. In this way, word identification, syntactic and even semantic analysis can be organized into a single phase. The results of testing the word identification on corpora taken from books and a Chinese newspaper are also presented.