Audio-visual active speaker tracking in cluttered indoors environments

Authors:
Fotios Talantzis;Aristodemos Pnevmatikakis;Anthony G. Constantinides
Affiliations:
Autonomic and Grid Computing Group, Athens Information Technology, Athens, Greece and Department of Electrical and Electronic Engineering, Imperial College London, London, UK;Autonomic and Grid Computing Group, Athens Information Technology, Athens, Greece;Department of Electrical and Electronic Engineering, Imperial College London, London, UK
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Year:
2009

Citing 8
Cited 4

Elements of information theory

Elements of information theory
Learning Patterns of Activity Using Real-Time Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate Bayesian Multibody Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Estimation of Background for Fixed Cameras

CIC '06 Proceedings of the 15th International Conference on Computing
Nonlinear filtering for speaker tracking in noisy and reverberant environments

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Particle filter with integrated voice activity detection for acoustic source tracking

EURASIP Journal on Applied Signal Processing
The CLEAR 2006 evaluation

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Kalman tracking with target feedback on adaptive background learning

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Guest editorial: special issue on human computing

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
An acoustic source localization and tracking framework using particle filtering and information theory

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Voice activity detection and speaker localization using audiovisual cues

Pattern Recognition Letters
Ant Colony Estimator: An intelligent particle filter based on ACOR

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.