Unknown word extraction for Chinese documents
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese unknown word identification using character-based tagging and chunking
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Automatic recognition of Chinese unknown words based on roles tagging
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Chinese prosody generation based on C-ToBI representation for text-to-speech
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
C-TOBI-Based pitch accent prediction using maximum-entropy model
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part III
Hi-index | 0.00 |
One of the most significant problems in POS (Part-of-Speech) tagging of Chinese texts is an identification of words in a sentence, since there is no blank to delimit the words. Because it is impossible to pre-register all the words in a dictionary, the problem of unknown words inevitably occurs during this process. Therefore, the unknown word problem has remarkable effects on the accuracy of the sound in Chinese TTS (Text-to-Speech) system. In this paper, we present a SVM (support vector machine) based method that predicts the unknown words for the result of word segmentation and tagging. For high speed processing to be used in a TTS, we pre-detect the candidate boundary of the unknown words before starting actual prediction. Therefore we perform a two-phase unknown word prediction in the steps of detection and prediction. Results of the experiments are very promising by showing high precision and high recall with also high speed.