A novel statistical chinese language model and its application in pinyin-to-character conversion

Authors:
Bo Lin;Jun Zhang
Affiliations:
Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 5
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A maximum entropy approach to natural language processing

Computational Linguistics
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An iterative algorithm to build Chinese language models

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics

Detecting word misuse in Chinese

WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a novel Chinese language model, and study its applications, in particular in Chinese pinyin-to-character conversion. In the new model, each word is associated with supporting context constructed by mining the frequent sets of nearby phrases and their distances to the word. Such information was usually overlooked in previous n-gram model and its variants. We apply the model to Chinese pinyin-to-character conversion and find that it offers a better solution to Chinese input. The model has lower perplexity in our evaluation and higher prediction accuracy than the state-of-the-art n-gram Markov model for Chinese language.