Analysis of Information in Speech and Its Application in Speech Recognition

  • Authors:
  • Sachin S. Kajarekar;Hynek Hermansky

  • Affiliations:
  • -;-

  • Venue:
  • TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
  • Year:
  • 2000

Quantified Score

Hi-index 0.04

Visualization

Abstract

Previous work analyzed the information in speech using analysis of variance (ANOVA).ANOVAassumes that sources of information (phone, speaker, and channel) are univariate gaussian. The sources of information, however, are not unimodal gaussian. Phones in speech recognition, e.g., are generally modeled using a multi-state, multi-mixture model. Therefore, this work extends ANOVA by assuming phones with 3 state, single mixture distribution and 5 state, single mixture distribution. This multi-state model was obtained by extracting variability due to position within phone from the error term in ANOVA. Further, linear discriminant analysis (LDA) is used to design discriminant features that better represent both the phone-induced variability and the position-within-phone variability. These features perform significantly better than conventional discriminant features obtained from 1-state phone model on continuous digit recognition task.