Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Authors:
Peng Li;Yong Guan;Shijin Wang;Bo Xu;Wenju Liu
Affiliations:
Digital Content Technology Research Centre, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;Digital Content Technology Research Centre, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;Digital Content Technology Research Centre, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China and National Laboratory of Pattern Recognition, Institute of Automation, Chi ...;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Venue:
Computer Speech and Language
Year:
2010

Citing 12
Cited 2

Speech recognition in noisy environments: a survey

Speech Communication
On stochastic feature and model compensation approaches to robust speech recognition

Speech Communication - Special issue on robust speech recognition
Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

Speech Communication
A blackboard architecture for computational auditory scene analysis

Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Robustness in Automatic Speech Recognition: Fundamentals and Applications

Robustness in Automatic Speech Recognition: Fundamentals and Applications
Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)

A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Auditory Segmentation Based on Onset and Offset Analysis

IEEE Transactions on Audio, Speech, and Language Processing
Monaural Speech Separation Based on Computational Auditory Scene Analysis and Objective Quality Assessment of Speech

IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Monaural speech separation and recognition challenge

Computer Speech and Language
The Markov selection model for concurrent speech recognition

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly.