Robust speech recognition by integrating speech separation and hypothesis testing

Authors:
Soundararajan Srinivasan;DeLiang Wang
Affiliations:
Biomedical Engineering Department, The Ohio State University, Columbus, OH 43210, USA;Department of Computer Science and Engineering and Center for Cognitive Science, The Ohio State University, Columbus, OH 43210, USA
Venue:
Speech Communication
Year:
2010

Citing 10
Cited 2

Speech recognition in noisy environments: a survey

Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Outlier Detection Using Classifier Instability

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Novelty detection: a review—part 1: statistical approaches

Signal Processing
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Integrating computational auditory scene analysis and automatic speech recognition

Integrating computational auditory scene analysis and automatic speech recognition
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
A Bayesian estimation approach for speech enhancement using hiddenMarkov models

IEEE Transactions on Signal Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Speech enhancement using combination of dereverberation and noise reduction for robust speech recognition

Proceedings of the Second Symposium on Information and Communication Technology
Robust speech recognition based on binaural speech enhancement system as a preprocessing step

Proceedings of the Third Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing-data methods attempt to improve robust speech recognition by distinguishing between reliable and unreliable data in the time-frequency (T-F) domain. Such methods require a binary mask to label speech-dominant T-F regions of a noisy speech signal as reliable and the rest as unreliable. Current methods for computing the mask are based mainly on bottom-up cues such as harmonicity and produce labeling errors that degrade recognition performance. In this paper, we propose a two-stage recognition system that combines bottom-up and top-down cues in order to simultaneously improve both mask estimation and recognition accuracy. First, an n-best lattice consistent with a speech separation mask is generated. The lattice is then re-scored by expanding the mask using a model-based hypothesis test to determine the reliability of individual T-F units. Systematic evaluations of the proposed system show significant improvement in recognition performance compared to that using speech separation alone.