Analysis of Information in Speech and Its Application in Speech Recognition

Authors:
Sachin S. Kajarekar;Hynek Hermansky
Affiliations:
-;-
Venue:
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Year:
2000

Citing 2
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Elements of information theory

Elements of information theory

Quantified Score

Hi-index	0.04

Visualization

Abstract

Previous work analyzed the information in speech using analysis of variance (ANOVA).ANOVAassumes that sources of information (phone, speaker, and channel) are univariate gaussian. The sources of information, however, are not unimodal gaussian. Phones in speech recognition, e.g., are generally modeled using a multi-state, multi-mixture model. Therefore, this work extends ANOVA by assuming phones with 3 state, single mixture distribution and 5 state, single mixture distribution. This multi-state model was obtained by extracting variability due to position within phone from the error term in ANOVA. Further, linear discriminant analysis (LDA) is used to design discriminant features that better represent both the phone-induced variability and the position-within-phone variability. These features perform significantly better than conventional discriminant features obtained from 1-state phone model on continuous digit recognition task.