Auditory inspired methods for localization of multiple concurrent speakers

Authors:
Tania Habib;Harald Romsdorfer
Affiliations:
Computer Science and Engineering Department, Univeristy of Engineering and Technology Lahore, Pakistan;Signal Processing and Speech Communication Lab, Graz University of Technology, Austria
Venue:
Computer Speech and Language
Year:
2013

Citing 13
Cited 0

Joint Estimation of Time Delay and Pitch of Voiced Speech Signals

ASILOMAR '95 Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers (2-Volume Set)
A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
A framework for speech source localization using sensor arrays

A framework for speech source localization using sensor arrays
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Signal Processing Methods for Music Transcription

Signal Processing Methods for Music Transcription
Acoustic MIMO Signal Processing (Signals and Communication Technology)

Acoustic MIMO Signal Processing (Signals and Communication Technology)
Joint time delay and frequency estimation of multiple sinusoids

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Particle filter with integrated voice activity detection for acoustic source tracking

EURASIP Journal on Applied Signal Processing
Estimating uncertainty models for speech source localization in real-world environments

Estimating uncertainty models for speech source localization in real-world environments
Exploiting correlogram structure for robust speech recognition with multiple speech sources

Speech Communication
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
A speech fragment approach to localising multiple speakers in reverberant environments

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Signal-based performance evaluation of dereverberation algorithms

Journal of Electrical and Computer Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of microphone arrays offers enhancements of speech signals recorded in meeting rooms and office spaces. A common solution for speech enhancement in realistic environments with ambient noise and multi-path propagation is the application of so-called beamforming techniques. Such beamforming algorithms enhance signals at the desired angle using constructive interference while attenuating signals coming from other directions by destructive interference. However, these techniques require as a priori the time difference of arrival information of the source. Therefore, the source localization and tracking algorithms are an integral part of such a system. The conventional localization algorithms deteriorate in realistic scenarios with multiple concurrent speakers. In contrast to conventional methods, the techniques presented in this paper make use of pitch information of speech signals in addition to the location information. This ''position-pitch''-based algorithm pre-processes the speech signals by a multiband gammatone filterbank that is inspired from the auditory model of the human inner ear. The role of this gammatone filterbank is analyzed and discussed in details. For a robust localization of multiple concurrent speakers, a frequency-selective criterion is explored that is based on a study of the human neural system's use of correlations between adjacent sub-band frequencies. This frequency-selective criterion leads to improved localization performance. To further improve localization accuracy, an algorithm based on grouping of spectro-temporal regions formed by pitch cues is presented. All proposed speaker localization algorithms are tested using a multichannel database where multiple concurrent speakers are active. The real-world recordings were made with a 24-channel uniform circular microphone array using loudspeakers and human speakers under various acoustic environments including moving concurrent speaker scenarios. The proposed techniques produced a localization performance that was significantly better than the state-of-the-art baseline in the scenarios tested.