Using reverberation to improve range and elevation discrimination for small array sound source localization

Authors:
Flavio Ribeiro;Cha Zhang;Dinei A. Florêncio;Demba Elimane Ba
Affiliations:
Electronic Systems Engineering Department, Escola Politécnica, Universidade de São Paulo, São Paulo, Brazil;Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA;Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Year:
2010

Citing 10
Cited 2

Distributed meetings: a meeting capture and broadcasting system

Proceedings of the tenth ACM international conference on Multimedia
A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Providing the basis for human-robot-interaction: a multi-modal attention system for a mobile robot

Proceedings of the 5th international conference on Multimodal interfaces
A multi-modal approach for determining speaker location and focus

Proceedings of the 5th international conference on Multimodal interfaces
A modulated complex lapped transform and its applications to audio processing

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 03
Robust adaptive time delay estimation for speaker localization in noisy and reverberant acoustic environments

EURASIP Journal on Applied Signal Processing
Marking up a world: visual markup for creating and manipulating virtual models

Proceedings of the 2nd International Conference on Immersive Telecommunications
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

IEEE Transactions on Audio, Speech, and Language Processing
An Accurate Algebraic Closed-Form Solution for Energy-Based Source Localization

IEEE Transactions on Audio, Speech, and Language Processing
Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings

IEEE Transactions on Multimedia

An environment aware ML estimation of acoustic radiation pattern with distributed microphone pairs

Signal Processing
Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sound source localization (SSL) is an essential task in many applications involving speech capture and enhancement. As such, speaker localization with microphone arrays has received significant research attention. Nevertheless, existing SSL algorithms for small arrays still have two significant limitations: lack of range resolution, and accuracy degradation with increasing reverberation. The latter is natural and expected, given that strong reflections can have amplitudes similar to that of the direct signal, but different directions of arrival. Therefore, correctly modeling the room and compensating for the reflections should reduce the degradation due to reverberation. In this paper, we show a stronger result. If modeled correctly, early reflections can be used to provide more information about the source location than would have been available in an anechoic scenario. The modeling not only compensates for the reverberation, but also significantly increases resolution for range and elevation. Thus, we show that under certain conditions and limitations, reverberation can be used to improve SSL performance. Prior attempts to compensate for reverberation tried to model the room impulse response (RIR). However, RIRs change quickly with speaker position, and are nearly impossible to track accurately. Instead, we build a 3-D model of the room, which we use to predict early reflections, which are then incorporated into the SSL estimation. Simulation results with real and synthetic data show that even a simplistic room model is sufficient to produce significant improvements in range and elevation estimation, tasks which would be very difficult when relying only on direct path signal components.