Applying an NVEF word-pair identifier to the Chinese syllable-to-word conversion problem

  • Authors:
  • Jia-Lin Tsai;Wen-Lian Hsu

  • Affiliations:
  • Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan, R.O.C.;Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan, R.O.C.

  • Venue:
  • COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Syllable-to-word (STW) conversion is important in Chinese phonetic input methods and speech recognition. There are two major problems in the STW conversion: (1) resolving the ambiguity caused by homonyms; (2) determining the word segmentation. This paper describes a noun-verb event-frame (NVEF) word identifier that can be used to solve these problems effectively. Our approach includes (a) an NVEF word-pair identifier and (b) other word identifiers for the non-NVEF portion.Our experiment showed that the NVEF word-pair identifier is able to achieve a 99.66% STW accuracy for the NVEF related portion, and by combining with other identifiers for the non-NVEF portion, the overall STW accuracy is 96.50%.The result of this study indicates that the NVEF knowledge is very powerful for the STW conversion. In fact, numerous cases requiring disambiguation in natural language processing fall into such "chicken-and-egg" situation. The NVEF knowledge can be employed as a general tool in such systems for disambiguating the NVEF related portion independently (thus breaking the chicken-and-egg situation) and using that as a good fundamental basis to treat the remaining portion. This shows that the NVEF knowledge is likely to be important for general NLP. To further expand its coverage, we shall extend the study of NVEF to that of other co-occurrence restrictions such as noun-noun pairs, noun-adjective pairs and verb-adverb pairs. We believe the STW accuracy can be further improved with the additional knowledge.