Using boosting to improve a hybrid HMM/neural network speech recognizer

Authors:
H. Schwenk
Affiliations:
Int. Comput. Sci. Inst., Berkeley, CA, USA
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Year:
1999

Citing 0
Cited 6

Boosted Audio-Visual HMM for Speech Reading

AMFG '03 Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures
Recognition of sign language subwords based on boosted hidden Markov models

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Increasing the Robustness of Boosting Algorithms within the Linear-programming Framework

Journal of VLSI Signal Processing Systems
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Directed decision trees for generating complementary systems

Speech Communication
Asymmetrically boosted HMM for speech reading

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

"Boosting" is a general method for improving the performance of almost any learning algorithm. A previously proposed and very promising boosting algorithm is AdaBoost. In this paper we investigate if AdaBoost can be used to improve a hybrid HMM/neural network continuous speech recognizer. Boosting significantly improves the word error rate from 6.3% to 5.3% on a test set of the OGI Numbers 95 corpus, a medium size continuous numbers recognition task. These results compare favorably with other combining techniques using several different feature representations or additional information from longer time spans. In summary, we can say that the reasons for the impressive success of AdaBoost are still not completely understood. To the best of our knowledge, an application of AdaBoost to a real world problem has not yet been reported in the literature either. In this paper we investigate if AdaBoost can be applied to boost the performance of a continuous speech recognition system. In this domain we have to deal with large amounts of data (often more than 1 million training examples) and inherently noisy phoneme labels. The paper is organized as follows. We summarize the AdaBoost algorithm and our baseline speech recognizer. We show how AdaBoost can be applied to this task and we report results on the Numbers 95 corpus and compare them with other classifier combination techniques. The paper finishes with a conclusion and perspectives for future work.