Speakers' direction finding using estimated time delays in the frequency domain

Authors:
Baruch Berdugo;Judith Rosenhouse;Haim Azhari
Affiliations:
The Julius Silver Institute of Biomedical Engineering, Technion -- IIT, Haifa 32000, Israel and Lamar Signal Processing Ltd., P.O. Box 573, Yokneam Ilit 20692, Israel;The Department of Humanities and Arts, Technion -- IIT, Haifa 32000, Israel;The Julius Silver Institute of Biomedical Engineering, Technion -- IIT, Haifa 32000, Israel
Venue:
Signal Processing
Year:
2002

Citing 3
Cited 1

Talker Variability in Speech Processing

Talker Variability in Speech Processing
Voice Source Localization for Automatic Camera Pointing System in Videoconferencing

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Tracking Multiple Talkers Using Microphone-Array Measurements

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1

Providing the basis for human-robot-interaction: a multi-modal attention system for a mobile robot

Proceedings of the 5th international conference on Multimodal interfaces

Quantified Score

Hi-index	0.08

Visualization

Abstract

Speaker localization is an important issue in the study of human communication, and is related to a variety of practical applications. When two or more speakers speak simultaneously, finding the direction of arrival of the speech signals is a complicated task. The spectral separation between different speech signals was first quantified. Some 40%, in the mean sense, of the spectral information for the 0-5 kHz band were found to differ significantly (by at least 10 dB) between any two speakers, even when they speak the same utterance at the same time and with the same intensity. Signals in the frequency domain were analyzed to transform the problem into a set of single-source single-frequency problems. This made it possible to apply a time delay direction finding (TDDF) algorithm (Berdugo et al., J. Acoust. Soc. Am. 105 (6) (1999) 3355). Next, a new "fusion" algorithm was developed which extended the solution to separate the speech signals of two speakers at low SNR values. The results obtained in simulations as well as in actual experimental studies, demonstrated high angular resolution between two speakers (approximately 20° for a 10 cm array extent) even at low SNR ratios. This algorithm may be suitable for various applications, such as video conferencing and hearing aids.