A Graphical Model for Audiovisual Object Tracking

Authors:
Matthew J. Beal;Nebojsa Jojic;Hagai Attias
Affiliations:
-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2003

Citing 14
Cited 18

Blind source separation and deconvolution: the dynamic component analysis algorithm

Neural Computation
An introduction to variational methods for graphical models

Proceedings of the NATO Advanced Study Institute on Learning in graphical models
A view of the EM algorithm that justifies incremental, sparse, and other variants

Proceedings of the NATO Advanced Study Institute on Learning in graphical models
Audio-visual tracking for natural interactivity

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Transformation-Invariant Clustering Using the EM Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distributed meetings: a meeting capture and broadcasting system

Proceedings of the tenth ACM international conference on Multimedia
Social Interaction of Humanoid RobotBased on Audio-Visual Tracking

IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Audio-Visual Speaker Detection Using Dynamic Bayesian Networks

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Voice Source Localization for Automatic Camera Pointing System in Videoconferencing

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Tracking Multiple Talkers Using Microphone-Array Measurements

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Audio-video array source separation for perceptual user interfaces

Proceedings of the 2001 workshop on Perceptive user interfaces
Active speech source localization by a dual coarse-to-fine search

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Joint audio-visual tracking using particle filters

EURASIP Journal on Applied Signal Processing
Real-time auditory and visual multiple-object tracking for humanoids

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

A General Framework for Combining Visual Trackers --- The "Black Boxes" Approach

International Journal of Computer Vision
On-line multi-modal speaker diarization

Proceedings of the 9th international conference on Multimodal interfaces
Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling

Speech Communication
Audio-Visual Clustering for 3D Speaker Localization

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Decision-Level Fusion for Audio-Visual Laughter Detection

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection and localization of 3d audio-visual objects using unsupervised clustering

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Structure inference for Bayesian multisensory perception and tracking

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Short-term audio-visual atoms for generic video concept classification

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Audio/video fusion for objects recognition

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Audio-visual atoms for generic video concept classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Conjugate mixture models for clustering multimodal data

Neural Computation
Dynamical information fusion of heterogeneous sensors for 3D tracking using particle swarm optimization

Information Fusion
Finding audio-visual events in informal social gatherings

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Detecting motion synchrony by video tubes

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Audio-visual grouplet: temporal audio-visual interactions for general video concept classification

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Determining correspondences between sensory and motor signals

PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part I
Joint audio-visual bi-modal codewords for video event detection

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Discovering joint audio---visual codewords for video event detection

Machine Vision and Applications

Quantified Score

Hi-index	0.14

Visualization

Abstract

We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.