High speed unknown word prediction using support vector machine for chinese text-to-speech systems

  • Authors:
  • Juhong Ha;Yu Zheng;Byeongchang Kim;Gary Geunbae Lee;Yoon-Suk Seong

  • Affiliations:
  • Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, South Korea;Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, South Korea;Division of Computer and Multimedia Engineering, UIDUK University, Gyeongju, South Korea;Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, South Korea;Division of the Japanese and Chinese Languages, UIDUK University, Gyeongju, South Korea

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the most significant problems in POS (Part-of-Speech) tagging of Chinese texts is an identification of words in a sentence, since there is no blank to delimit the words. Because it is impossible to pre-register all the words in a dictionary, the problem of unknown words inevitably occurs during this process. Therefore, the unknown word problem has remarkable effects on the accuracy of the sound in Chinese TTS (Text-to-Speech) system. In this paper, we present a SVM (support vector machine) based method that predicts the unknown words for the result of word segmentation and tagging. For high speed processing to be used in a TTS, we pre-detect the candidate boundary of the unknown words before starting actual prediction. Therefore we perform a two-phase unknown word prediction in the steps of detection and prediction. Results of the experiments are very promising by showing high precision and high recall with also high speed.