Telephone speech recognition using neural networks and hidden Markov models

Authors:
DongSuk Yuk;J. Flanagan
Affiliations:
CAIP Center, Rutgers Univ., Piscataway, NJ, USA;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 6

Hidden Markov Model and Neural Network Hybrid

EurAsia-ICT '02 Proceedings of the First EurAsian Conference on Information and Communication Technology
Robust in-car speech recognition based on nonlinear multiple regressions

EURASIP Journal on Applied Signal Processing
A Neural Network Based Regression Approach for Recognizing Simultaneous Speech

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Arabic phoneme identification using conventional and concurrent neural networks in non native speakers

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
How to drink from a fire hose: one person can annoscribe 693 thousand utterances in one month

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Efficient binary tree multiclass SVM using genetic algorithms for vowels recognition

CIMMACS'11/ISP'11 Proceedings of the 10th WSEAS international conference on Computational Intelligence, Man-Machine Systems and Cybernetics, and proceedings of the 10th WSEAS international conference on Information Security and Privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of the transmission channels. In this paper, neural network based adaptation methods are applied to telephone speech recognition and a new unsupervised model adaptation method is proposed. The advantage of the neural network based approach is that the retraining of speech recognizers for telephone speech is avoided. Furthermore, because the multi-layer neural network is able to compute nonlinear functions, it can accommodate for the non-linear mapping between full bandwidth speech and telephone speech. The new unsupervised model adaptation method does not require transcriptions and can be used with the neural networks. Experimental results on TIMIT/NTIMIT corpora show that the performance of the proposed methods is comparable to that of recognizers retained on telephone speech.