The power of amnesia: learning probabilistic automata with variable memory length
Machine Learning - Special issue on COLT '94
Combination of n-grams and Stochastic Context-Free Grammars for language modeling
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Statistics based hybrid approach to Chinese base phrase identification
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Hi-index | 0.01 |
In statistical language models, how to integrate diverse linguistic knowledge in a general framework for long-distance dependencies is a challenging issue. In this paper, an improved language model incorporating linguistic structure into maximum entropy framework is presented. The proposed model combines trigram with the structure knowledge of base phrase in which trigram is used to capture the local relation between words, while the structure knowledge of base phrase is considered to represent the long-distance relations between syntactical structures. The knowledge of syntax, semantics and vocabulary is integrated into the maximum entropy framework. Experimental results show that the proposed model improves by 24% for language model perplexity and increases about 3% for sign language recognition rate compared with the trigram model.