Enhanced speech features by single-channel joint compensation of noise and reverberation

Authors:
Matthias Wölfel
Affiliations:
Institut fur Theoretische Informatik, Universität Karlsruhe, Karlsruhe, Germany
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 4
Cited 4

A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A new approach for the adaptation of HMMs to reverberation and background noise

Speech Communication
The rich transcription 2006 spring meeting recognition evaluation

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A two-stage algorithm for one-microphone reverberant speech enhancement

IEEE Transactions on Audio, Speech, and Language Processing

Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Model-based feature enhancement for reverberant speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Particle filter enhancement of speech spectral amplitudes

IEEE Transactions on Audio, Speech, and Language Processing
A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

For a natural verbal communication between humans and machines, automatic speech recognition, which works reasonably well on recordings captured with mid- or far-field microphones, is essential. While a lot of research and development are devoted to address one of the two distortions frequently encountered in mid- and far-field sound pickup, namely noise or reverberation, less effort has been undertaken to jointly combat both kinds of distortions. In our view, however, this is essential to further reduce the demolishing effect by moving the microphone away from the speaker's mouth because in real environments both kinds of distortions are present. In this paper, we propose a first step into this direction by integrating an estimate of the reverberation energy derived by an auxiliary model based on multistep linear prediction, into a framework, which, so far tracks and removes nonstationary additive distortion by particle filters in a low-dimension logarithmic power frequency domain. On actual recordings with different speaker-to-microphone distances, we observe that combating, in the feature space, either nonstationary noise or reverberation alone, on a single channel, is already able to improve speech recognition performance before and after acoustic model adaptation. Furthermore, we observe that a simple concatenation of techniques addressing either additive noise or reverberation can further improve the accuracy in some cases. Last but not least, we demonstrate that the joint estimation and removal of both kinds of distortions, as proposed in this publication, further improve the accuracy of the text output.