Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing

Authors:
Marc Delcroix;Tomohiro Nakatani;Shinji Watanabe
Affiliations:
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 5
Cited 7

Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Blind Model Selection for Automatic Speech Recognition in Reverberant Environments

Journal of VLSI Signal Processing Systems
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Harmonicity-Based Blind Dereverberation for Single-Channel Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing
A two-stage algorithm for one-microphone reverberant speech enhancement

IEEE Transactions on Audio, Speech, and Language Processing

Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Model-based feature enhancement for reverberant speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

Computer Speech and Language
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Computer Speech and Language
Uncertainty-based learning of acoustic models from noisy data

Computer Speech and Language
An Improved Method for Late-Reverberant Suppression Based on Statistical Model

Speech Communication
Structural Bayesian Linear Regression for Hidden Markov Models

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the Expectation Maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80% reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31%), and the final error rate was 5.4%, which was obtained by combining the proposed variance compensation and MLLR adaptation.