Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement

Authors:
Björn Schuller;Martin Wöllmer;Tobias Moosmayr;Gerhard Rigoll
Affiliations:
Institute for Human-Machine Communication, Technische Universität München, Munich, Germany;Institute for Human-Machine Communication, Technische Universität München, Munich, Germany;BMW Group, Forschungs- und Innovationszentrum, München, Germany;Institute for Human-Machine Communication, Technische Universität München, Munich, Germany
Venue:
EURASIP Journal on Audio, Speech, and Music Processing
Year:
2009

Citing 22
Cited 5

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
Cepstral domain segmental feature vector normalization for noise robust speech recognition

Speech Communication - Special issue on robust speech recognition
Relevance of time-frequency features for phonetic and speaker-channel classification

Speech Communication
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Speech recognition in noisy environments

Speech recognition in noisy environments
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Hidden Conditional Random Fields for Gesture Recognition

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
minimum classification error linear regression for acoustic model adaptation of continuous density HMMS

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Discriminative language modeling with conditional random fields and the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Long Short-Term Memory

Neural Computation
Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems

The Journal of Machine Learning Research
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
An FFT-based companding front end for noise-robust automatic speech recognition

EURASIP Journal on Audio, Speech, and Music Processing
On the Necessity and Feasibility of Detecting a Driver's Emotional State While Driving

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Static and Dynamic Modelling for the Recognition of Non-verbal Vocalisations in Conversational Speech

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
An application of recurrent neural networks to discriminative keyword spotting

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Applications of support vector machines to speech recognition

IEEE Transactions on Signal Processing
Quantile based histogram equalization for noise robust large vocabulary speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Switching Linear Dynamical Systems for Noise Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
An advanced contrast enhancement using partially overlapped sub-block histogram equalization

IEEE Transactions on Circuits and Systems for Video Technology

Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Conversational speech recognition in non-stationary reverberated environments

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language
Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise inside a car. In contrast to existing works, we aim to improve noise robustness focusing on all major levels of speech recognition: feature extraction, feature enhancement, speech modelling, and training. Thereby, we give an overview of promising auditory modelling concepts, speech enhancement techniques, training strategies, and model architecture, which are implemented in an in-car digit and spelling recognition task considering noises produced by various car types and driving conditions. We prove that joint speech and noise modelling with a Switching Linear Dynamic Model (SLDM) outperforms speech enhancement techniques like Histogram Equalisation (HEQ) with a mean relative error reduction of 52.7% over various noise types and levels. Embedding a Switching Linear Dynamical System (SLDS) into a Switching Autoregressive Hidden Markov Model (SAR-HMM) prevails for speech disturbed by additive white Gaussian noise.