Conversational speech recognition in non-stationary reverberated environments

Authors:
Rudy Rotili;Emanuele Principi;Martin Wöllmer;Stefano Squartini;Björn Schuller
Affiliations:
Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Ancona, Italy;Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Ancona, Italy;Institute for Human-Machine Communication, Technische Universität München, Germany;Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Ancona, Italy;Institute for Human-Machine Communication, Technische Universität München, Germany
Venue:
COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Year:
2011

Citing 8
Cited 0

Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations

EURASIP Journal on Applied Signal Processing
Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement

EURASIP Journal on Audio, Speech, and Music Processing
Keyword spotting based system for conversation fostering in tabletop scenarios: preliminary evaluation

HSI'09 Proceedings of the 2nd conference on Human System Interactions
Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Speech Dereverberation

Speech Dereverberation
Comparative evaluation of single-channel MMSE-Based noise reduction schemes for speech recognition

Journal of Electrical and Computer Engineering
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
A class of frequency-domain adaptive approaches to blind multichannel identification

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.