Incorporating spectral subtraction and noise type for unvoiced speech segregation

Authors:
Ke Hu;DeLiang Wang
Affiliations:
Department of Computer Science and Engineering&Center for Cognitive Science, The Ohio State University, Columbus, 43210-1277, USA;Department of Computer Science and Engineering&Center for Cognitive Science, The Ohio State University, Columbus, 43210-1277, USA
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 1

A tandem algorithm for pitch estimation and voiced speech segregation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unvoiced speech poses a big challenge to current monaural speech segregation systems. It lacks harmonic structure and is highly susceptible to interference due to its relatively weak energy. This paper describes a new approach to segregate unvoiced speech from nonspeech interference. The system first estimates a voiced binary mask, and then performs unvoiced speech segregation in two stages: segmentation and grouping. In segmentation, time-frequency units labeled as 0 in the voiced binary mask are first used to estimate the noise energy and spectral subtraction is then performed to generate time-frequency segments in unvoiced intervals. Based on the type of noise, unvoiced segments are grouped either by selecting segments consistent with those generated by onset/offset analysis or by Bayesian classification of acoustic-phonetic features. Systematic evaluation and comparison show that the proposed approach improves the performance of unvoiced speech segregation considerably.