A new approach for the adaptation of HMMs to reverberation and background noise

Authors:
Hans-Günter Hirsch;Harald Finster
Affiliations:
Niederrhein University of Applied Sciences, Department of Electrical Engineering and Computer Science, Reinarzstr. 49, 47805 Krefeld, Germany;Niederrhein University of Applied Sciences, Department of Electrical Engineering and Computer Science, Reinarzstr. 49, 47805 Krefeld, Germany
Venue:
Speech Communication
Year:
2008

Citing 5
Cited 6

Environmental conditions and acoustic transduction in hands-free speech recognition

Speech Communication - Special issue on robust speech recognition
HMM adaptation for applications in telecommunication

Speech Communication - Special issue on noise robust ASR
Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments

Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments
Adaptation method based on HMM composition and EM algorithm

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Blind deconvolution of reverberated speech signals via regularization

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05

Enhanced speech features by single-channel joint compensation of noise and reverberation

IEEE Transactions on Audio, Speech, and Language Processing
Model-based feature enhancement for reverberant speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II)

International Journal of Speech Technology
Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing

Computer Speech and Language
A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Looking at practical application scenarios of speech recognition systems several distortion effects exist that have a major influence on the speech signal and can considerably deteriorate the recognition performance. So far, mainly the influence of stationary background noise and of unknown frequency characteristics has been studied. A further distortion effect is the hands-free speech input in a reverberant room environment. A new approach is presented to adapt the energy and spectral parameters of HMMs as well as their time derivatives to the modifications by the speech input in a reverberant environment. The only parameter, needed for the adaptation, is an estimate of the reverberation time. The usability of this adaptation technique is shown by presenting the improvements for a series of recognition experiments on reverberant speech data. The approach for adapting the time derivatives of the acoustic parameters can be applied in general for all different types of distortions and is not restricted to the case of a hands-free input. The use of a hands-free speech input comes along with the recording of any background noise that is present in the room. Thus there exists the need of combining the adaptation to reverberant conditions with the adaptation to background noise and unknown frequency characteristics. A combined adaptation scheme for all mentioned effects is presented in this paper. The adaptation is based on an estimation of the noise characteristics before the beginning of speech is detected. The estimation of the distortion parameters is based on signal processing techniques. The applicability is demonstrated by showing the improvements on artificially distorted data as well as on real recordings in rooms.