Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A maximum entropy approach to natural language processing
Computational Linguistics
Toward a unified approach to statistical language modeling for Chinese
ACM Transactions on Asian Language Information Processing (TALIP)
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An iterative algorithm to build Chinese language models
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Detecting word misuse in Chinese
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Hi-index | 0.00 |
In this paper, we present a novel Chinese language model, and study its applications, in particular in Chinese pinyin-to-character conversion. In the new model, each word is associated with supporting context constructed by mining the frequent sets of nearby phrases and their distances to the word. Such information was usually overlooked in previous n-gram model and its variants. We apply the model to Chinese pinyin-to-character conversion and find that it offers a better solution to Chinese input. The model has lower perplexity in our evaluation and higher prediction accuracy than the state-of-the-art n-gram Markov model for Chinese language.