An audio-visual particle filter for speaker tracking on the CLEAR'06 evaluation dataset

Authors:
Kai Nickel;Tobias Gehrig;Hazim K. Ekenel;John McDonough;Rainer Stiefelhagen
Affiliations:
Interactive Systems Labs, University of Karlsruhe, Karlsruhe, Germany;Interactive Systems Labs, University of Karlsruhe, Karlsruhe, Germany;Interactive Systems Labs, University of Karlsruhe, Karlsruhe, Germany;Interactive Systems Labs, University of Karlsruhe, Karlsruhe, Germany;Interactive Systems Labs, University of Karlsruhe, Karlsruhe, Germany
Venue:
CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Year:
2006

Citing 3
Cited 0

CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
A framework for speech source localization using sensor arrays

A framework for speech source localization using sensor arrays
Joint audio-visual tracking using particle filters

EURASIP Journal on Applied Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR'06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.