Large-margin minimum classification error training: A theoretical risk minimization perspective

Authors:
Dong Yu;Li Deng;Xiaodong He;Alex Acero
Affiliations:
Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA;Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA;Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA;Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA
Venue:
Computer Speech and Language
Year:
2008

Citing 7
Cited 5

Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Improved Generalization Through Explicit Optimization of Margins

Machine Learning
Comparison of discriminative training criteria and optimization methods for speech recognition

Speech Communication
minimum classification error linear regression for acoustic model adaptation of continuous density HMMS

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Discriminative training for large vocabulary telephone-based name recognition

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
Large-Margin Discriminative Training of Hidden Markov Models for Speech Recognition

ICSC '07 Proceedings of the International Conference on Semantic Computing
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

IEEE Transactions on Audio, Speech, and Language Processing

Training data selection for improving discriminative training of acoustic models

Pattern Recognition Letters
A novel framework and training algorithm for variable-parameter hidden Markov models

IEEE Transactions on Audio, Speech, and Language Processing
Incremental word learning: Efficient HMM initialization and large margin discriminative adaptation

Speech Communication
Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese text recognition

Pattern Recognition
Robust and Efficient Pattern Classification using Large Geometric Margin Minimum Classification Error Training

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-margin discriminative training of hidden Markov models has received significant attention recently. A natural and interesting question is whether the existing discriminative training algorithms can be extended directly to embed the concept of margin. In this paper, we give this question an affirmative answer by showing that the sigmoid bias in the conventional minimum classification error (MCE) training can be interpreted as a soft margin. We justify this claim from a theoretical classification risk minimization perspective where the loss function associated with a non-zero sigmoid bias is shown to include not only empirical error rates but also a margin-bound risk. Based on this perspective, we propose a practical optimization strategy that adjusts the margin (sigmoid bias) incrementally in the MCE training process so that a desirable balance between the empirical error rates on the training set and the margin can be achieved. We call this modified MCE training process large-margin minimum classification error (LM-MCE) training to differentiate it from the conventional MCE. Speech recognition experiments have been carried out on two tasks. First, in the TIDIGITS recognition task, LM-MCE outperforms the state-of-the-art MCE method with 17% relative digit-error reduction and 19% relative string-error reduction. Second, on the Microsoft internal large vocabulary telephony speech recognition task (with 2000h of training data and 120K words in the vocabulary), significant recognition accuracy improvement is achieved, demonstrating that our formulation of LM-MCE can be successfully scaled up and applied to large-scale speech recognition tasks.