Joint audio-visual tracking using particle filters

Authors:
Dmitry N. Zotkin;Ramani Duraiswami;Larry S. Davis
Affiliations:
Perceptual Interfaces and Reality Laboratory, Department of Computer Science, University of Maryland Institute for Advanced Computer Studies, University of Maryland at College Park, College Park, ...;Perceptual Interfaces and Reality Laboratory, Department of Computer Science, University of Maryland Institute for Advanced Computer Studies, University of Maryland at College Park, College Park, ...;Perceptual Interfaces and Reality Laboratory, Department of Computer Science, University of Maryland Institute for Advanced Computer Studies, University of Maryland at College Park, College Park, ...
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2002

Citing 9
Cited 15

Color indexing

International Journal of Computer Vision
CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
Audio-visual tracking for natural interactivity

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
A Probabilistic Exclusion Principle for Tracking Multiple Objects

International Journal of Computer Vision
Multiple view geometry in computer vision

Multiple view geometry in computer vision
ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume I - Volume I
Elliptical Head Tracking Using Intensity Gradients and Color Histograms

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Rooms

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Active speech source localization by a dual coarse-to-fine search

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05

A Graphical Model for Audiovisual Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
A joint particle filter for audio-visual speaker tracking

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Audio-visual perception of a lecturer in a smart seminar room

Signal Processing - Special section: Multimodal human-computer interfaces
Multi-dimensional visual tracking using scatter search particle filter

Pattern Recognition Letters
Audio-Visual Clustering for 3D Speaker Localization

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection and localization of 3d audio-visual objects using unsupervised clustering

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
A Memory-Based Particle Filter for Visual Tracking through Occlusions

IWINAC '09 Proceedings of the 3rd International Work-Conference on The Interplay Between Natural and Artificial Computation: Part II: Bioinspired Applications in Artificial and Natural Computation
Detecting, tracking and interacting with people in a public space

Proceedings of the 2009 international conference on Multimodal interfaces
Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs

Machine Vision and Applications
An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Multiple and variable target visual tracking for video-surveillance applications

Pattern Recognition Letters
Acoustic sensor-based multiple object tracking with visual information association

EURASIP Journal on Advances in Signal Processing
Dynamical information fusion of heterogeneous sensors for 3D tracking using particle swarm optimization

Information Fusion
Finding audio-visual events in informal social gatherings

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Radar-based road-traffic monitoring in urban environments

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion for tracking people in a videoconferencing environment using multiple cameras and multiple microphone arrays. One advantage of our proposed tracker is its ability to seamlessly handle temporary absence of some measurements (e.g., camera occlusion or silence). Another advantage is the possibility of self-calibration of the joint system to compensate for imprecision in the knowledge of array or camera parameters by treating them as containing an unknown statistical component that can be determined using the particle filter framework during tracking. We implement the algorithm in the context of a videoconferencing and meeting recording system. The system also performs high-level semantic analysis of the scene by keeping participant tracks, recognizing turn-taking events and recording an annotated transcript of the meeting. Experimental results are presented. Our system operates in real time and is shown to be robust and reliable.