Incorporating spectral subtraction and noise type for unvoiced speech segregation

  • Authors:
  • Ke Hu;DeLiang Wang

  • Affiliations:
  • Department of Computer Science and Engineering&Center for Cognitive Science, The Ohio State University, Columbus, 43210-1277, USA;Department of Computer Science and Engineering&Center for Cognitive Science, The Ohio State University, Columbus, 43210-1277, USA

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unvoiced speech poses a big challenge to current monaural speech segregation systems. It lacks harmonic structure and is highly susceptible to interference due to its relatively weak energy. This paper describes a new approach to segregate unvoiced speech from nonspeech interference. The system first estimates a voiced binary mask, and then performs unvoiced speech segregation in two stages: segmentation and grouping. In segmentation, time-frequency units labeled as 0 in the voiced binary mask are first used to estimate the noise energy and spectral subtraction is then performed to generate time-frequency segments in unvoiced intervals. Based on the type of noise, unvoiced segments are grouped either by selecting segments consistent with those generated by onset/offset analysis or by Bayesian classification of acoustic-phonetic features. Systematic evaluation and comparison show that the proposed approach improves the performance of unvoiced speech segregation considerably.