Blind Model Selection for Automatic Speech Recognition in Reverberant Environments

Authors:
Laurent Couvreur;Christophe Couvreur
Affiliations:
Multitel—TCTS, Faculté Polytechnique de Mons, 1 Avenue Copernic, B-7000 Mons, Belgium;Speech & Language Technology Division, Scansoft, Inc., 32 Guldensporenpark, B-9820 Merelbeke, Belgium
Venue:
Journal of VLSI Signal Processing Systems
Year:
2004

Citing 5
Cited 3

A microphone array processing technique for speech enhancement in a reverberant space

Speech Communication
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Recognizing Reverberant Speech with RASTA - PLP

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Training of HMM with filtered speech material for hands-free recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Compensating of room acoustic transfer functions affected by change of room temperature

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02

Stereophonic Acoustic Echo Canceler Based on Two-Filter Scheme

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing

IEEE Transactions on Audio, Speech, and Language Processing
Model-based feature enhancement for reverberant speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This communication presents a new method for automatic speech recognition in reverberant environments. Our approach consists in the selection of the best acoustic model out of a library of models trained on artificially reverberated speech databases corresponding to various reverberant conditions. Given a speech utterance recorded within a reverberant room, a Maximum Likelihood estimate of the fullband room reverberation time is computed using a statistical model for short-term log-energy sequences of anechoic speech. The estimated reverberation time is then used to select the best acoustic model, i.e., the model trained on a speech database most closely matching the estimated reverberation time, which serves to recognize the reverberated speech utterance. The proposed model selection approach is shown to improve significantly recognition accuracy for a connected digit task in both simulated and real reverberant environments, outperforming standard channel normalization techniques.