A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Modified Kneser-Ney Smoothing of n-gram Models
Modified Kneser-Ney Smoothing of n-gram Models
A new statistical approach to Chinese Pinyin input
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A segment-based hidden markov model for real-setting pinyin-to-chinese conversion
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The application of hidden Markov models in speech recognition
Foundations and Trends in Signal Processing
Detecting word misuse in Chinese
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Hi-index | 0.00 |
Most current research and applications on Pinyin to Chinese word conversion employs a hidden Markov model (HMMs) which in turn uses a character-based language model. The reason is because Chinese texts are written without word boundaries. However in some tasks that involve the Pinyin to Chinese conversion, such as Chinese text proofreading, the original Chinese text is known. This enables us to extract the words and a word-based language model can be developed. In this paper we compare the two models and come to a conclusion that using word-based bi-gram language model achieve higher conversion accuracy than character-based bi-gram language model.