Robust speech recognition by integrating speech separation and hypothesis testing

  • Authors:
  • Soundararajan Srinivasan;DeLiang Wang

  • Affiliations:
  • Biomedical Engineering Department, The Ohio State University, Columbus, OH 43210, USA;Department of Computer Science and Engineering and Center for Cognitive Science, The Ohio State University, Columbus, OH 43210, USA

  • Venue:
  • Speech Communication
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Missing-data methods attempt to improve robust speech recognition by distinguishing between reliable and unreliable data in the time-frequency (T-F) domain. Such methods require a binary mask to label speech-dominant T-F regions of a noisy speech signal as reliable and the rest as unreliable. Current methods for computing the mask are based mainly on bottom-up cues such as harmonicity and produce labeling errors that degrade recognition performance. In this paper, we propose a two-stage recognition system that combines bottom-up and top-down cues in order to simultaneously improve both mask estimation and recognition accuracy. First, an n-best lattice consistent with a speech separation mask is generated. The lattice is then re-scored by expanding the mask using a model-based hypothesis test to determine the reliability of individual T-F units. Systematic evaluations of the proposed system show significant improvement in recognition performance compared to that using speech separation alone.