Robust speech separation using time-frequency masking

  • Authors:
  • P. Aarabi;Guangji Shi;O. Jahromi

  • Affiliations:
  • Artificial Perception Lab., Toronto Univ., Ont., Canada;Artificial Perception Lab., Toronto Univ., Ont., Canada;Artificial Perception Lab., Toronto Univ., Ont., Canada

  • Venue:
  • ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A multi-microphone time-frequency speech masking technique is proposed. This technique utilizes both the time-frequency magnitude and phase information in order to estimate the signal-to-noise ratio (SNR) maximizing masking coefficients for each time-frequency block given that the direction (or alternatively, the time-delay of arrival) of the speaker of interest is known. Using this masking algorithm, speech features (such as formants) from the direction of interest are preserved while features from other directions are severely degraded. Digit recognition experiments indicate that the proposed technique can result in a substantial increase in the digit recognition accuracy rate. At 0 dB, for example, the proposed technique results in a digit recognition accuracy rate improvement of 26% over the single microphone case and an improvement of 12% over the two microphone superdirective beamforming case.