Maximum entropy-based reinforcement learning using a confidence measure in speech recognition for telephone speech

Authors:
Carlos Molina;Nestor Becerra Yoma;Fernando Huenupán;Claudio Garretón;Jorge Wuth
Affiliations:
Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 7
Cited 0

A maximum entropy approach to natural language processing

Computational Linguistics
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Minimum Entropy Approach for Multisensor Data Fusion

SPWHOS '97 Proceedings of the 1997 IEEE Signal Processing Workshop on Higher-Order Statistics (SPW-HOS '97)
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Computer Speech and Language
Speaker Adaptation With Limited Data Using Regression-Tree-Based Spectral Peak Alignment

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a novel confidence-based reinforcement learning (RL) scheme to correct observation log-likelihoods and to address the problem of unsupervised compensation with limited estimation data is proposed. A two-step Viterbi decoding is presented which estimates a correction factor for the observation log-likelihoods that makes the recognized and neighboring HMMs more or less likely by using a confidence score. If regions in the output delivered by the recognizer exhibit low confidence scores, the second Viterbi decoding will tend to focus the search on neighboring models. In contrast, if recognized regions exhibit high confidence scores, the second Viterbi decoding will tend to retain the recognition output obtained at the first step. The proposed RL mechanism is modeled as the linear combination of two metrics or information sources: the acoustic model log-likelihood and the logarithm of a confidence metric. A criterion based on incremental conditional entropy maximization to optimize a linear combination of metrics or information sources online is also presented. The method requires only one utterance, as short as 0.7 s, and can lead to significant reductions in word error rate (WER) between 3% and 18%, depending on the task, training-testing conditions, and method used to optimize the proposed RL scheme. In contrast to ordinary feature compensation and model parameter adaptation methods, the confidence-based RL method takes place in the frame log-likelihood domain. Consequently, as shown in the results presented here, it is complementary to feature compensation and to model adaptation techniques.