An estimate of an upper bound for the entropy of English
Computational Linguistics
Class-based n-gram models of natural language
Computational Linguistics
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
On the Estimation of 'Small' Probabilities by Leaving-One-Out
IEEE Transactions on Pattern Analysis and Machine Intelligence
Active feedback in ad hoc information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating word relationships into language models
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
On smoothing techniques for bigram-based natural language modelling
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Cooccurrence smoothing for stochastic language modeling
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Hi-index | 0.00 |
In this paper, we discuss the properties of statistical behavior and entropies of three smoothing methods; two well-known and one proposed smoothing method will be used on three language models in Mandarin data sets. Because of the problem of data sparseness, smoothing methods are employed to estimate the probability for each event (including all the seen and unseen events) in a language model. A set of properties used to analyze the statistical behaviors of three smoothing methods are proposed. Our proposed smoothing methods comply with all the properties. We implement three language models in Mandarin data sets and then discuss the entropy. In general, the entropies of proposed smoothing method for three models are lower than that of other two methods.