An introduction to variational methods for graphical models
Proceedings of the NATO Advanced Study Institute on Learning in graphical models
A view of the EM algorithm that justifies incremental, sparse, and other variants
Proceedings of the NATO Advanced Study Institute on Learning in graphical models
Audio-visual tracking for natural interactivity
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Transformation-Invariant Clustering Using the EM Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
Distributed meetings: a meeting capture and broadcasting system
Proceedings of the tenth ACM international conference on Multimedia
Social Interaction of Humanoid RobotBased on Audio-Visual Tracking
IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Audio-Visual Speaker Detection Using Dynamic Bayesian Networks
FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Voice Source Localization for Automatic Camera Pointing System in Videoconferencing
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Tracking Multiple Talkers Using Microphone-Array Measurements
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Audio-video array source separation for perceptual user interfaces
Proceedings of the 2001 workshop on Perceptive user interfaces
Active speech source localization by a dual coarse-to-fine search
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Joint audio-visual tracking using particle filters
EURASIP Journal on Applied Signal Processing
Real-time auditory and visual multiple-object tracking for humanoids
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A General Framework for Combining Visual Trackers --- The "Black Boxes" Approach
International Journal of Computer Vision
On-line multi-modal speaker diarization
Proceedings of the 9th international conference on Multimodal interfaces
Audio-Visual Clustering for 3D Speaker Localization
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Decision-Level Fusion for Audio-Visual Laughter Detection
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection and localization of 3d audio-visual objects using unsupervised clustering
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Structure inference for Bayesian multisensory perception and tracking
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Short-term audio-visual atoms for generic video concept classification
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Audio/video fusion for objects recognition
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Audio-visual atoms for generic video concept classification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Conjugate mixture models for clustering multimodal data
Neural Computation
Finding audio-visual events in informal social gatherings
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Detecting motion synchrony by video tubes
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Audio-visual grouplet: temporal audio-visual interactions for general video concept classification
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Determining correspondences between sensory and motor signals
PCM'04 Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part I
Joint audio-visual bi-modal codewords for video event detection
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Discovering joint audio---visual codewords for video event detection
Machine Vision and Applications
Hi-index | 0.14 |
We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.