Discrete visual features modeling via leave-one-out likelihood estimation and applications
Journal of Visual Communication and Image Representation
Cooccurrence smoothing for stochastic language modeling
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Automatic word classification using simulated annealing
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
On the dynamic adaptation of stochastic language models
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Statistical behavior analysis of smoothing methods for language models of mandarin data sets
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Hi-index | 0.00 |
The authors study various problems related to smoothing bigram probabilities for natural language modeling: the type of interpolation, i.e. linear vs. nonlinear, the optimal estimation of interpolation parameters, and the use of word equivalence classes (parts of speech). A nonlinear interpolation method that results in significant improvements over linear interpolation in the experimental tests is proposed. It is shown that the leaving-one-out method in combination with the maximum likelihood criterion can be efficiently used for the optimal estimation of interpolation parameters. In addition, an automatic clustering procedure is developed for finding word equivalence classes using a maximum likelihood criterion. Experimental results are presented for two text databases: a German database with 100000 words and an English database with 1.1 million words.