Automatic word classification using simulated annealing

  • Authors:
  • Michèle Jardino;Gilles Adda

  • Affiliations:
  • LIMSI-CNRS, Orsay Cedex, France;LIMSI-CNRS, Orsay Cedex, France

  • Venue:
  • ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Continuous speech recognition on very large vocabularies can be improved in theory using a language model specifying the a priori conditional probability of finding a word given a word sequence. As this seems utopic to implement in practice, more realistic solutions have been proposed, as the determination of n-gram word models [3] or of n-gram class models [4], limiting the length of the word sequence to n items. We built a bigram class model which gives the probability of a word class given its predecessor class. A stochastic method, namely simulated annealing, is used to automatically classify the words of large text corpora. We present here a first validation of the use of simulated annealing in language modelling. Results are presented' using respectively a French corpus of 40000 words and a German corpus of 100000 words and a comparison with another method of statistical clustering is exhibited.