Robust combination of neural networks and hidden Markov models for speech recognition

Authors:
E. Trentin;M. Gori
Affiliations:
Dipt. di Ingegneria dell'Infoimazione, Siena Univ., Italy;-
Venue:
IEEE Transactions on Neural Networks
Year:
2003

Citing 0
Cited 6

Artificial Neural Networks for Document Analysis and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Classification of graphical data made easy

Neurocomputing
A Maximum-Likelihood Connectionist Model for Unsupervised Learning over Graphical Domains

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
A novel connectionist-oriented feature normalization technique

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Hidden markov model networks for multiaspect discriminative features extraction from radar targets

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Acoustic modeling in state-of-the-art speech recognition systems usually relies on hidden Markov models (HMMs) with Gaussian emission densities. HMMs suffer from intrinsic limitations, mainly due to their arbitrary parametric assumption. Artificial neural networks (ANNs) appear to be a promising alternative in this respect, but they historically failed as a general solution to the acoustic modeling problem. This paper introduces algorithms based on a gradient-ascent technique for global training of a hybrid ANN/HMM system, in which the ANN is trained for estimating the emission probabilities of the states of the HMM. The approach is related to the major hybrid systems proposed by Bourlard and Morgan and by Bengio, with the aim of combining their benefits within a unified framework and to overcome their limitations. Several viable solutions to the "divergence problem"-that may arise when training is accomplished over the maximum-likelihood (ML) criterion-are proposed. Experimental results in speaker-independent, continuous speech recognition over Italian digit-strings validate the novel hybrid framework, allowing for improved recognition performance over HMMs with mixtures of Gaussian components, as well as over Bourlard and Morgan's paradigm. In particular, it is shown that the maximum a posteriori (MAP) version of the algorithm yields a 46.34% relative word error rate reduction with respect to standard HMMs.