A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Robust speech recognition using the modulation spectrogram
Speech Communication - Special issue on robust speech recognition
Cepstral domain segmental feature vector normalization for noise robust speech recognition
Speech Communication - Special issue on robust speech recognition
Relevance of time-frequency features for phonetic and speaker-channel classification
Speech Communication
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Speech recognition in noisy environments
Speech recognition in noisy environments
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Hidden Conditional Random Fields for Gesture Recognition
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Discriminative language modeling with conditional random fields and the perceptron algorithm
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Neural Computation
Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems
The Journal of Machine Learning Research
A vector Taylor series approach for environment-independent speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
An FFT-based companding front end for noise-robust automatic speech recognition
EURASIP Journal on Audio, Speech, and Music Processing
On the Necessity and Feasibility of Detecting a Driver's Emotional State While Driving
ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
An application of recurrent neural networks to discriminative keyword spotting
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Applications of support vector machines to speech recognition
IEEE Transactions on Signal Processing
Quantile based histogram equalization for noise robust large vocabulary speech recognition
IEEE Transactions on Audio, Speech, and Language Processing
Switching Linear Dynamical Systems for Noise Robust Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
An advanced contrast enhancement using partially overlapped sub-block histogram equalization
IEEE Transactions on Circuits and Systems for Video Technology
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Conversational speech recognition in non-stationary reverberated environments
COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Computer Speech and Language
Hi-index | 0.00 |
Performance of speech recognition systems strongly degrades in the presence of background noise, like the driving noise inside a car. In contrast to existing works, we aim to improve noise robustness focusing on all major levels of speech recognition: feature extraction, feature enhancement, speech modelling, and training. Thereby, we give an overview of promising auditory modelling concepts, speech enhancement techniques, training strategies, and model architecture, which are implemented in an in-car digit and spelling recognition task considering noises produced by various car types and driving conditions. We prove that joint speech and noise modelling with a Switching Linear Dynamic Model (SLDM) outperforms speech enhancement techniques like Histogram Equalisation (HEQ) with a mean relative error reduction of 52.7% over various noise types and levels. Embedding a Switching Linear Dynamical System (SLDS) into a Switching Autoregressive Hidden Markov Model (SAR-HMM) prevails for speech disturbed by additive white Gaussian noise.