Distributed meetings: a meeting capture and broadcasting system
Proceedings of the tenth ACM international conference on Multimedia
A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Providing the basis for human-robot-interaction: a multi-modal attention system for a mobile robot
Proceedings of the 5th international conference on Multimodal interfaces
A multi-modal approach for determining speaker location and focus
Proceedings of the 5th international conference on Multimodal interfaces
A modulated complex lapped transform and its applications to audio processing
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 03
EURASIP Journal on Applied Signal Processing
Marking up a world: visual markup for creating and manipulating virtual models
Proceedings of the 2nd International Conference on Immersive Telecommunications
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings
IEEE Transactions on Audio, Speech, and Language Processing
An Accurate Algebraic Closed-Form Solution for Energy-Based Source Localization
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Multimedia
Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
Sound source localization (SSL) is an essential task in many applications involving speech capture and enhancement. As such, speaker localization with microphone arrays has received significant research attention. Nevertheless, existing SSL algorithms for small arrays still have two significant limitations: lack of range resolution, and accuracy degradation with increasing reverberation. The latter is natural and expected, given that strong reflections can have amplitudes similar to that of the direct signal, but different directions of arrival. Therefore, correctly modeling the room and compensating for the reflections should reduce the degradation due to reverberation. In this paper, we show a stronger result. If modeled correctly, early reflections can be used to provide more information about the source location than would have been available in an anechoic scenario. The modeling not only compensates for the reverberation, but also significantly increases resolution for range and elevation. Thus, we show that under certain conditions and limitations, reverberation can be used to improve SSL performance. Prior attempts to compensate for reverberation tried to model the room impulse response (RIR). However, RIRs change quickly with speaker position, and are nearly impossible to track accurately. Instead, we build a 3-D model of the room, which we use to predict early reflections, which are then incorporated into the SSL estimation. Simulation results with real and synthetic data show that even a simplistic room model is sufficient to produce significant improvements in range and elevation estimation, tasks which would be very difficult when relying only on direct path signal components.