Fundamental limitation of frequency domain blind source separation for convolutive mixture of speech

  • Authors:
  • S. Araki;S. Makino;T. Nishikawa;H. Saruwatari

  • Affiliations:
  • NTT Commun. Sci. Labs., Kyoto, Japan;-;-;-

  • Venue:
  • ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, separation performance is still not good enough. In particular, when the length of impulse response is long, performance is highly limited. We show it is useless to be constrained by the condition, P /spl Lt/ T, where T is the frame size of FFT and P is the length of room impulse response. From our experiments. a frame size of 256 or 512 (32 or 64 ms at a sampling frequency of 8 kHz) is best even for the long room reverberation of T/sub R/ = 150 and 300 ms. We also clarified the reason for poor performance of BSS in a long reverberant environment, finding that separation is achieved chiefly for the sound from the direction of jammers because BSS cannot calculate the inverse of the room transfer function both for the target and jammer signals.