Improved Generalization Through Explicit Optimization of Margins
Machine Learning
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Discriminative training for large vocabulary telephone-based name recognition
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
Large-Margin Discriminative Training of Hidden Markov Models for Speech Recognition
ICSC '07 Proceedings of the International Conference on Semantic Computing
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error
IEEE Transactions on Audio, Speech, and Language Processing
Training data selection for improving discriminative training of acoustic models
Pattern Recognition Letters
A novel framework and training algorithm for variable-parameter hidden Markov models
IEEE Transactions on Audio, Speech, and Language Processing
Journal of Signal Processing Systems
Hi-index | 0.00 |
Large-margin discriminative training of hidden Markov models has received significant attention recently. A natural and interesting question is whether the existing discriminative training algorithms can be extended directly to embed the concept of margin. In this paper, we give this question an affirmative answer by showing that the sigmoid bias in the conventional minimum classification error (MCE) training can be interpreted as a soft margin. We justify this claim from a theoretical classification risk minimization perspective where the loss function associated with a non-zero sigmoid bias is shown to include not only empirical error rates but also a margin-bound risk. Based on this perspective, we propose a practical optimization strategy that adjusts the margin (sigmoid bias) incrementally in the MCE training process so that a desirable balance between the empirical error rates on the training set and the margin can be achieved. We call this modified MCE training process large-margin minimum classification error (LM-MCE) training to differentiate it from the conventional MCE. Speech recognition experiments have been carried out on two tasks. First, in the TIDIGITS recognition task, LM-MCE outperforms the state-of-the-art MCE method with 17% relative digit-error reduction and 19% relative string-error reduction. Second, on the Microsoft internal large vocabulary telephony speech recognition task (with 2000h of training data and 120K words in the vocabulary), significant recognition accuracy improvement is achieved, demonstrating that our formulation of LM-MCE can be successfully scaled up and applied to large-scale speech recognition tasks.