Speech recognition in noisy environments: a survey
Speech Communication
On stochastic feature and model compensation approaches to robust speech recognition
Speech Communication - Special issue on robust speech recognition
A blackboard architecture for computational auditory scene analysis
Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
Robustness in Automatic Speech Recognition: Fundamentals and Applications
Robustness in Automatic Speech Recognition: Fundamentals and Applications
Acoustical and Environmental Robustness in Automatic Speech Recognition
Acoustical and Environmental Robustness in Automatic Speech Recognition
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Auditory Segmentation Based on Onset and Offset Analysis
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation
IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation
IEEE Transactions on Neural Networks
Monaural speech separation and recognition challenge
Computer Speech and Language
The Markov selection model for concurrent speech recognition
Neurocomputing
Hi-index | 0.00 |
Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly.