A computational auditory scene analysis system for speech segregation and robust speech recognition

Authors:
Yang Shao;Soundararajan Srinivasan;Zhaozhang Jin;DeLiang Wang
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA;Biomedical Engineering Department, The Ohio State University, Columbus, OH 43210, USA;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA and Center for Cognitive Science, The Ohio State University, Columbus, OH 43210, USA
Venue:
Computer Speech and Language
Year:
2010

Citing 12
Cited 6

Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discrete-time signal processing (2nd ed.)

Discrete-time signal processing (2nd ed.)
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
A Review of Nonlinear Diffusion Filtering

SCALE-SPACE '97 Proceedings of the First International Conference on Scale-Space Theory in Computer Vision
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Sequential organization in computational auditory scene analysis

Sequential organization in computational auditory scene analysis
A Bayesian estimation approach for speech enhancement using hiddenMarkov models

IEEE Transactions on Signal Processing
Transforming Binary Uncertainties for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Auditory Segmentation Based on Onset and Offset Analysis

IEEE Transactions on Audio, Speech, and Language Processing
Model-based sequential organization in cochannel speech

IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Monaural speech separation and recognition challenge

Computer Speech and Language
The Markov selection model for concurrent speech recognition

Neurocomputing
Disordered voice measurement and auditory analysis

Speech Communication
Uncertainty-based learning of acoustic models from noisy data

Computer Speech and Language
Optimization of the parameters characterizing sigmoidal rate-level functions based on acoustic features

Speech Communication
Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance.