High speed unknown word prediction using support vector machine for chinese text-to-speech systems

Authors:
Juhong Ha;Yu Zheng;Byeongchang Kim;Gary Geunbae Lee;Yoon-Suk Seong
Affiliations:
Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, South Korea;Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, South Korea;Division of Computer and Multimedia Engineering, UIDUK University, Gyeongju, South Korea;Department of Computer Science & Engineering, Pohang University of Science & Technology, Pohang, South Korea;Division of the Japanese and Chinese Languages, UIDUK University, Gyeongju, South Korea
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 4
Cited 3

Unknown word extraction for Chinese documents

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese unknown word identification using character-based tagging and chunking

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Automatic recognition of Chinese unknown words based on roles tagging

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Chinese prosody generation based on C-ToBI representation for text-to-speech

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Efficient appointment information extraction from short messages in mobile devices with limited hardware resources

Pattern Recognition Letters
C-TOBI-Based pitch accent prediction using maximum-entropy model

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most significant problems in POS (Part-of-Speech) tagging of Chinese texts is an identification of words in a sentence, since there is no blank to delimit the words. Because it is impossible to pre-register all the words in a dictionary, the problem of unknown words inevitably occurs during this process. Therefore, the unknown word problem has remarkable effects on the accuracy of the sound in Chinese TTS (Text-to-Speech) system. In this paper, we present a SVM (support vector machine) based method that predicts the unknown words for the result of word segmentation and tagging. For high speed processing to be used in a TTS, we pre-detect the candidate boundary of the unknown words before starting actual prediction. Therefore we perform a two-phase unknown word prediction in the steps of detection and prediction. Results of the experiments are very promising by showing high precision and high recall with also high speed.