Audio-Video Sensor Fusion with Probabilistic Graphical Models

Authors:
Matthew J. Beal;Hagai Attias;Nebojsa Jojic
Affiliations:
-;-;-
Venue:
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Year:
2002

Citing 3
Cited 4

Blind source separation and deconvolution: the dynamic component analysis algorithm

Neural Computation
Learning in graphical models

Learning in graphical models
Voice Source Localization for Automatic Camera Pointing System in Videoconferencing

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1

Multimodal multispeaker probabilistic tracking in meetings

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Audio/video fusion for objects recognition

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
A novel multi-modal integration and propagation model for cross-media information retrieval

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Movie keyframe retrieval based on cross-media correlation detection and context model

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.