Discovering Cues to Error Detection in Speech Recognition Output: A User-Centered Approach
Journal of Management Information Systems
Active learning with semi-automatic annotation for extractive speech summarization
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
This paper addresses the correct choice and combination of confidence measures in large vocabulary speech recognition tasks. We classify single words within continuous as well as large vocabulary utterances into two categories: utterances within the vocabulary which are recognized correctly, and other utterances, namely misrecognized utterances or (less frequent) out-of-vocabulary (OOV). To this end, we investigate the classification error rate (CER) of several classes of confidence measures and transformations. In particular, we employed data-independent and data-dependent measures. The transformations we investigated include mapping to single confidence measures and linear combinations of these measures. These combinations are computed by means of neural networks trained with Bayes-optimal, and with Gardner-Derrida-optimal criteria. Compared to a recognition system without confidence measures, the selection of (various combinations of) confidence measures, the selection of suitable neural network architectures and training methods, continuously improves the CER.