Hidden Markov models, maximum mutual information estimation, and the speech recognition problem
Hidden Markov models, maximum mutual information estimation, and the speech recognition problem
An improved approach to robust speech recognition using minimum error classification
Speech Communication
Hidden Markov Models for Speech Recognition
Hidden Markov Models for Speech Recognition
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Lattice-based discriminative training for large vocabulary speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Component-based discriminative classification for hidden Markov models
Pattern Recognition
Error approximation and minimum phone error acoustic model estimation
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In this paper, we propose a novel discriminative objective function for the estimation of hidden Markov model (HMM) parameters, based on the calculation of overall risk. For continuous speech recognition, the algorithm minimises the risk of misclassification on the training database and thus maximises recognition accuracy. The calculation of the risk is based on the measurement of Levenshtein distance between the correct transcription and the N-best recognised transcriptions, which counts the number of recognition errors - deleted, substituted and inserted words. The minimisation of the proposed criterion is implemented by using the extended Baum-Welch algorithm for the estimation of the discrete HMM parameters and the Normandin's extension of the algorithm for the estimation of the continuous densities. We tested the performance of the proposed algorithm on two tasks: phoneme recognition on the TIMIT database and continuous speech recognition on the Resource Management (RM) database. Results show consistent decrease of the recognition error rate when compared to the standard maximum likelihood estimation training. The highest achieved decrease of word error rate in our experiments is 20.8% on the TIMIT phoneme recognition task, 39.6% on the RM task with context-independent HMMs and 18.22% with context-dependent models.