Experiments with tree-structured MMI encoders on the RM task

  • Authors:
  • Mark T. Anikst;William S. Meisel;Matthew C. Soares

  • Affiliations:
  • -;-;-

  • Venue:
  • HLT '90 Proceedings of the workshop on Speech and Natural Language
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the tree-structured maximum mutual information (MMI) encoders used in SSI's Phonetic Engine® to perform large-vocabulary, continuous speech recognition. The MMI encoders are arranged into a two-stage cascade. At each stage, the encoder is trained to maximize the mutual information between a set of phonetic targets and corresponding codes. After each stage, the codes are compressed into segments. This step expands acoustic-phonetic context and reduces subsequent computation. We evaluated these MMI encoders by comparing them against a standard minimum distortion (MD) vector quantizer (encoder). Both encoders produced code streams, which were used to train speaker-independent discrete hidden Markov models in a simplified version of the Sphinx system [3]. We used data from the DARPA Resource Management (RM) task. The two-stage cascade of MMI encoders significantly outperforms the standard MD encoder in both speed and accuracy.