Audio-Visual Speaker Localization Using Graphical Models

Authors:
Akash Kushal;Mandar Rahurkar;Li Fei-Fei;Jean Ponce;Thomas Huang
Affiliations:
University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign;University of Illinois, Urbana Champaign
Venue:
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Year:
2006

Citing 0
Cited 4

Event detection using "variable module graphs" for home care applications

EURASIP Journal on Applied Signal Processing
Audio-Visual Clustering for 3D Speaker Localization

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection and localization of 3d audio-visual objects using unsupervised clustering

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Conjugate mixture models for clustering multimodal data

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth.