Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition

Authors:
Armin Sehr;Roland Maas;Walter Kellermann
Affiliations:
Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Erlangen, Germany;Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Erlangen, Germany;Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Erlangen, Germany
Venue:
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Year:
2010

Citing 12
Cited 2

Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Recognizing Reverberant Speech with RASTA - PLP

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming

Mathematical Programming: Series A and B
Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech

IEICE - Transactions on Information and Systems
Training of HMM with filtered speech material for hands-free recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations

EURASIP Journal on Applied Signal Processing
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Enhanced speech features by single-channel joint compensation of noise and reverberation

IEEE Transactions on Audio, Speech, and Language Processing
Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing

IEEE Transactions on Audio, Speech, and Language Processing
Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction

IEEE Transactions on Audio, Speech, and Language Processing
Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

IEEE Transactions on Audio, Speech, and Language Processing

Conversational speech recognition in non-stationary reverberated environments

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
An Improved Method for Late-Reverberant Suppression Based on Statistical Model

Speech Communication

Quantified Score

Hi-index	0.01

Visualization

Abstract

The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in "Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain" (A. Sehr et al., in Proc. Interspeech, 2006, pp. 769-772) for melspectral features, is extended to logarithmic melspectral (logmelspec) features in this contribution. Thus, the favorable properties of REMOS, including its high flexibility with respect to changing reverberation conditions, become available in the more competitive logmelspec domain. Based on a combined acoustic model consisting of a hidden Markov model (HMM) network and a reverberation model (RM), REMOS determines clean-speech and reverberation estimates during recognition. Therefore, in each iteration of a modified Viterbi algorithm, an inner optimization operation maximizes the joint density of the current HMM output and the RM output subject to the constraint that their combination is equal to the current reverberant observation. Since the combination operation in the logmelspec domain is nonlinear, numerical methods appear necessary for solving the constrained inner optimization problem. A novel reformulation of the constraint, which allows for an efficient solution by nonlinear optimization algorithms, is derived in this paper so that a practicable implementation of REMOS for logmelspec features becomes possible. An in-depth analysis of this REMOS implementation investigates the statistical properties of its reverberation estimates and thus derives possibilities for further improving the performance of REMOS. Connected digit recognition experiments show that the proposed REMOS version in the logmelspec domain significantly outperforms the melspec version. While the proposed RMs with parameters estimated by straightforward training for a given room are robust to a mismatch of the speaker-microphone distance, their performance significantly decreases if they are used in a room with substantially different conditions. However, by training multi-style RMs with data from several rooms, good performance can be achieved across different rooms.