Speakers' direction finding using estimated time delays in the frequency domain

  • Authors:
  • Baruch Berdugo;Judith Rosenhouse;Haim Azhari

  • Affiliations:
  • The Julius Silver Institute of Biomedical Engineering, Technion -- IIT, Haifa 32000, Israel and Lamar Signal Processing Ltd., P.O. Box 573, Yokneam Ilit 20692, Israel;The Department of Humanities and Arts, Technion -- IIT, Haifa 32000, Israel;The Julius Silver Institute of Biomedical Engineering, Technion -- IIT, Haifa 32000, Israel

  • Venue:
  • Signal Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.08

Visualization

Abstract

Speaker localization is an important issue in the study of human communication, and is related to a variety of practical applications. When two or more speakers speak simultaneously, finding the direction of arrival of the speech signals is a complicated task. The spectral separation between different speech signals was first quantified. Some 40%, in the mean sense, of the spectral information for the 0-5 kHz band were found to differ significantly (by at least 10 dB) between any two speakers, even when they speak the same utterance at the same time and with the same intensity. Signals in the frequency domain were analyzed to transform the problem into a set of single-source single-frequency problems. This made it possible to apply a time delay direction finding (TDDF) algorithm (Berdugo et al., J. Acoust. Soc. Am. 105 (6) (1999) 3355). Next, a new "fusion" algorithm was developed which extended the solution to separate the speech signals of two speakers at low SNR values. The results obtained in simulations as well as in actual experimental studies, demonstrated high angular resolution between two speakers (approximately 20° for a 10 cm array extent) even at low SNR ratios. This algorithm may be suitable for various applications, such as video conferencing and hearing aids.