A new approach for the adaptation of HMMs to reverberation and background noise

  • Authors:
  • Hans-Günter Hirsch;Harald Finster

  • Affiliations:
  • Niederrhein University of Applied Sciences, Department of Electrical Engineering and Computer Science, Reinarzstr. 49, 47805 Krefeld, Germany;Niederrhein University of Applied Sciences, Department of Electrical Engineering and Computer Science, Reinarzstr. 49, 47805 Krefeld, Germany

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Looking at practical application scenarios of speech recognition systems several distortion effects exist that have a major influence on the speech signal and can considerably deteriorate the recognition performance. So far, mainly the influence of stationary background noise and of unknown frequency characteristics has been studied. A further distortion effect is the hands-free speech input in a reverberant room environment. A new approach is presented to adapt the energy and spectral parameters of HMMs as well as their time derivatives to the modifications by the speech input in a reverberant environment. The only parameter, needed for the adaptation, is an estimate of the reverberation time. The usability of this adaptation technique is shown by presenting the improvements for a series of recognition experiments on reverberant speech data. The approach for adapting the time derivatives of the acoustic parameters can be applied in general for all different types of distortions and is not restricted to the case of a hands-free input. The use of a hands-free speech input comes along with the recording of any background noise that is present in the room. Thus there exists the need of combining the adaptation to reverberant conditions with the adaptation to background noise and unknown frequency characteristics. A combined adaptation scheme for all mentioned effects is presented in this paper. The adaptation is based on an estimation of the noise characteristics before the beginning of speech is detected. The estimation of the distortion parameters is based on signal processing techniques. The applicability is demonstrated by showing the improvements on artificially distorted data as well as on real recordings in rooms.