International Journal of Computer Vision
CONDENSATION—Conditional Density Propagation forVisual Tracking
International Journal of Computer Vision
Audio-visual tracking for natural interactivity
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
A Probabilistic Exclusion Principle for Tracking Multiple Objects
International Journal of Computer Vision
Multiple view geometry in computer vision
Multiple view geometry in computer vision
ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework
ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume I - Volume I
Elliptical Head Tracking Using Intensity Gradients and Color Histograms
CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Active speech source localization by a dual coarse-to-fine search
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
A joint particle filter for audio-visual speaker tracking
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Audio-visual perception of a lecturer in a smart seminar room
Signal Processing - Special section: Multimodal human-computer interfaces
Multi-dimensional visual tracking using scatter search particle filter
Pattern Recognition Letters
Audio-Visual Clustering for 3D Speaker Localization
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection and localization of 3d audio-visual objects using unsupervised clustering
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
A Memory-Based Particle Filter for Visual Tracking through Occlusions
IWINAC '09 Proceedings of the 3rd International Work-Conference on The Interplay Between Natural and Artificial Computation: Part II: Bioinspired Applications in Artificial and Natural Computation
Detecting, tracking and interacting with people in a public space
Proceedings of the 2009 international conference on Multimodal interfaces
An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset
CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Multiple and variable target visual tracking for video-surveillance applications
Pattern Recognition Letters
Acoustic sensor-based multiple object tracking with visual information association
EURASIP Journal on Advances in Signal Processing
Finding audio-visual events in informal social gatherings
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Radar-based road-traffic monitoring in urban environments
Digital Signal Processing
Hi-index | 0.00 |
It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion for tracking people in a videoconferencing environment using multiple cameras and multiple microphone arrays. One advantage of our proposed tracker is its ability to seamlessly handle temporary absence of some measurements (e.g., camera occlusion or silence). Another advantage is the possibility of self-calibration of the joint system to compensate for imprecision in the knowledge of array or camera parameters by treating them as containing an unknown statistical component that can be determined using the particle filter framework during tracking. We implement the algorithm in the context of a videoconferencing and meeting recording system. The system also performs high-level semantic analysis of the scene by keeping participant tracks, recognizing turn-taking events and recording an annotated transcript of the meeting. Experimental results are presented. Our system operates in real time and is shown to be robust and reliable.