Detection, Estimation, and Modulation Theory: Radar-Sonar Signal Processing and Gaussian Signals in Noise
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings
IEEE Transactions on Audio, Speech, and Language Processing
Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
We propose an algorithm for multimodal object localization with a depth sensor and stereo microphones. For this we formulate a joint probability distribution of object locations conditioned upon depth and acoustic observations. Then we use the maximum a posteriori estimation for object localization. For multimodal fusion, we map likelihood of acoustic observation given time difference of arrival information to that given object location in a three dimensional space. Our method offers a principled way to fuse information from microphones and depth sensors, and experimentally we find that it reliably locates the object without requiring careful calibration of the sensors.