Joint Estimation of Time Delay and Pitch of Voiced Speech Signals
ASILOMAR '95 Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers (2-Volume Set)
A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
A framework for speech source localization using sensor arrays
A framework for speech source localization using sensor arrays
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Signal Processing Methods for Music Transcription
Signal Processing Methods for Music Transcription
Acoustic MIMO Signal Processing (Signals and Communication Technology)
Acoustic MIMO Signal Processing (Signals and Communication Technology)
Joint time delay and frequency estimation of multiple sinusoids
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Particle filter with integrated voice activity detection for acoustic source tracking
EURASIP Journal on Applied Signal Processing
Estimating uncertainty models for speech source localization in real-world environments
Estimating uncertainty models for speech source localization in real-world environments
Discrete-time speech signal processing: principles and practice
Discrete-time speech signal processing: principles and practice
A speech fragment approach to localising multiple speakers in reverberant environments
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Signal-based performance evaluation of dereverberation algorithms
Journal of Electrical and Computer Engineering
Hi-index | 0.00 |
The use of microphone arrays offers enhancements of speech signals recorded in meeting rooms and office spaces. A common solution for speech enhancement in realistic environments with ambient noise and multi-path propagation is the application of so-called beamforming techniques. Such beamforming algorithms enhance signals at the desired angle using constructive interference while attenuating signals coming from other directions by destructive interference. However, these techniques require as a priori the time difference of arrival information of the source. Therefore, the source localization and tracking algorithms are an integral part of such a system. The conventional localization algorithms deteriorate in realistic scenarios with multiple concurrent speakers. In contrast to conventional methods, the techniques presented in this paper make use of pitch information of speech signals in addition to the location information. This ''position-pitch''-based algorithm pre-processes the speech signals by a multiband gammatone filterbank that is inspired from the auditory model of the human inner ear. The role of this gammatone filterbank is analyzed and discussed in details. For a robust localization of multiple concurrent speakers, a frequency-selective criterion is explored that is based on a study of the human neural system's use of correlations between adjacent sub-band frequencies. This frequency-selective criterion leads to improved localization performance. To further improve localization accuracy, an algorithm based on grouping of spectro-temporal regions formed by pitch cues is presented. All proposed speaker localization algorithms are tested using a multichannel database where multiple concurrent speakers are active. The real-world recordings were made with a 24-channel uniform circular microphone array using loudspeakers and human speakers under various acoustic environments including moving concurrent speaker scenarios. The proposed techniques produced a localization performance that was significantly better than the state-of-the-art baseline in the scenarios tested.