Noise-tolerant speech recognition: the SNN-TA approach

Authors:
Edmondo Trentin;Marco Matassoni
Affiliations:
Dipartimento di Ingegneria dell'Informazione, via Roma, 56, 53100 Siena, Italy and ITC-irst Centro per la Ricerca Scientifica e Tecnologica, via Sommarive, 18, 38050 Povo (Trento), Italy;ITC-irst Centro per la Ricerca Scientifica e Tecnologica, via Sommarive, 18, 38050 Povo (Trento), Italy
Venue:
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches
Year:
2003

Citing 5
Cited 0

Spoken Dialogues with Computers

Spoken Dialogues with Computers
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Networks with trainable amplitude of activation functions

Neural Networks
Training of HMM with filtered speech material for hands-free recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Neural network learning theory draws a relationship between "learning with noise" and applying a regularization term in the cost function that is minimized during the training process on clean (non-noisy) data. Application of regularizers and other robust training techniques are aimed at improving the generalization capabilities of connectionist models, reducing overfitting. In spite of that, the generalization problem is usually overlooked by automatic speech recognition (ASR) practioners who use hidden Markov models (HMM) or other standard ASR paradigms. Nonetheless, it is reasonable to expect that an adequate neural network model (due to its universal approximation property and generalization capability) along with a suitable regularizer can exhibit good recognition performance whenever noise is added to the test data, although training is accomplished on clean data. This paper presents applications of a variant of the so called segmental neural network (SNN), introduced at BBN by Zavaliagkos et al. for rescoring the N-best hypothesis yielded by a standard continuous density HMM (CDHMM). An enhanced connectionist model, called SNN with trainable amplitude of activation functions (SNN-TA) is first used in this paper instead of the CDHMM to perform the recognition of isolated words. Viterbi-based segmentation is then introduced, relying on the level-building algorithm, that can be combined with the SNN-TA to obtain a hybrid framework for continuous speech recognition. The proposed paradigm is applied to the recognition of isolated and connected Italian digits under several noisy conditions, outperforming the CDHMMs.