Chinese Pinyin-Text Conversion on Segmented Text

Authors:
Wei Liu;Louise Guthrie
Affiliations:
Department of Computer Science, University of Sheffield,;Department of Computer Science, University of Sheffield,
Venue:
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Year:
2009

Citing 6
Cited 1

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Modified Kneser-Ney Smoothing of n-gram Models

Modified Kneser-Ney Smoothing of n-gram Models
A new statistical approach to Chinese Pinyin input

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A segment-based hidden markov model for real-setting pinyin-to-chinese conversion

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing

Detecting word misuse in Chinese

WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most current research and applications on Pinyin to Chinese word conversion employs a hidden Markov model (HMMs) which in turn uses a character-based language model. The reason is because Chinese texts are written without word boundaries. However in some tasks that involve the Pinyin to Chinese conversion, such as Chinese text proofreading, the original Chinese text is known. This enables us to extract the words and a word-based language model can be developed. In this paper we compare the two models and come to a conclusion that using word-based bi-gram language model achieve higher conversion accuracy than character-based bi-gram language model.