Automatic word classification using simulated annealing

Authors:
Michèle Jardino;Gilles Adda
Affiliations:
LIMSI-CNRS, Orsay Cedex, France;LIMSI-CNRS, Orsay Cedex, France
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Year:
1993

Citing 2
Cited 0

Simulated annealing: theory and applications

Simulated annealing: theory and applications
On smoothing techniques for bigram-based natural language modelling

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Continuous speech recognition on very large vocabularies can be improved in theory using a language model specifying the a priori conditional probability of finding a word given a word sequence. As this seems utopic to implement in practice, more realistic solutions have been proposed, as the determination of n-gram word models [3] or of n-gram class models [4], limiting the length of the word sequence to n items. We built a bigram class model which gives the probability of a word class given its predecessor class. A stochastic method, namely simulated annealing, is used to automatically classify the words of large text corpora. We present here a first validation of the use of simulated annealing in language modelling. Results are presented' using respectively a French corpus of 40000 words and a German corpus of 100000 words and a comparison with another method of statistical clustering is exhibited.