Robust speech separation using time-frequency masking

Authors:
P. Aarabi;Guangji Shi;O. Jahromi
Affiliations:
Artificial Perception Lab., Toronto Univ., Ont., Canada;Artificial Perception Lab., Toronto Univ., Ont., Canada;Artificial Perception Lab., Toronto Univ., Ont., Canada
Venue:
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Year:
2003

Citing 1
Cited 1

An information-maximization approach to blind separation and blind deconvolution

Neural Computation

Using pitch, amplitude modulation, and spatial cues for separation of harmonic instruments from stereo music recordings

EURASIP Journal on Applied Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A multi-microphone time-frequency speech masking technique is proposed. This technique utilizes both the time-frequency magnitude and phase information in order to estimate the signal-to-noise ratio (SNR) maximizing masking coefficients for each time-frequency block given that the direction (or alternatively, the time-delay of arrival) of the speaker of interest is known. Using this masking algorithm, speech features (such as formants) from the direction of interest are preserved while features from other directions are severely degraded. Digit recognition experiments indicate that the proposed technique can result in a substantial increase in the digit recognition accuracy rate. At 0 dB, for example, the proposed technique results in a digit recognition accuracy rate improvement of 26% over the single microphone case and an improvement of 12% over the two microphone superdirective beamforming case.