Elements of information theory
Elements of information theory
Learning Patterns of Activity Using Real-Time Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate Bayesian Multibody Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Estimation of Background for Fixed Cameras
CIC '06 Proceedings of the 15th International Conference on Computing
Nonlinear filtering for speaker tracking in noisy and reverberant environments
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Particle filter with integrated voice activity detection for acoustic source tracking
EURASIP Journal on Applied Signal Processing
CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Kalman tracking with target feedback on adaptive background learning
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Guest editorial: special issue on human computing
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Voice activity detection and speaker localization using audiovisual cues
Pattern Recognition Letters
Ant Colony Estimator: An intelligent particle filter based on ACOR
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.