Training of HMM with filtered speech material for hands-free recognition

Authors:
D. Giuliani;M. Matassoni;M. Omologo;P. Svaizer
Affiliations:
ITC-IRST, Centro per la Ricerca Sci. e Technol., Trento, Italy;-;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 5

Analytic assessment of telephone transmission impact on ASR performance using a simulation model

Speech Communication
Noise-tolerant speech recognition: the SNN-TA approach

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches
Blind Model Selection for Automatic Speech Recognition in Reverberant Environments

Journal of VLSI Signal Processing Systems
Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed either using a clean speech database or using a filtered version of the same database. Filtering consists in a convolution with the acoustic impulse response between the speaker and microphone, to reproduce the reverberation effect. Background noise is summed to provide the desired SNR. The paper shows that the new models trained on these data perform better than the baseline ones. Furthermore, the paper investigates on maximum likelihood linear regression (MLLR) adaptation of the new models. It is shown that a further performance improvement is obtained, allowing to reach a 98.7% WRR in a connected digit recognition task, when the talker is at 1.5 m distance from the array.