When a mismatch can be good: large vocabulary speech recognition trained with idealized tandem features

Authors:
Arlo Faria;Nelson Morgan
Affiliations:
University of California at Berkeley, Berkeley, CA;International Computer Science Institute, Berkeley, CA
Venue:
Proceedings of the 2008 ACM symposium on Applied computing
Year:
2008

Citing 0
Cited 1

Building a highly accurate Mandarin speech recognizer with language-independent technologies and language-dependent modules

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores Tandem feature extraction used in a large-vocabulary speech recognition system. In this framework a multi-layer perceptron estimates phone probabilities which are treated as acoustic observations in a traditional HMM-GMM system. To determine a lower error bound, we simulated an idealized classifier based on alignment of reference transcriptions. This cheating experiment demonstrated a best-case scenario for Tandem feature extraction, highlighting the potential for dramatic system improvement. More importantly, we discovered a way to exploit the result without cheating: using the simulated classifier during training and a MLP classifier at test, the performance improved despite the mismatched Tandem features.