Audiovisual head orientation estimation with particle filtering in multisensor scenarios
EURASIP Journal on Advances in Signal Processing
Head Orientation Estimation Using Particle Filtering in Multiview Scenarios
Multimodal Technologies for Perception of Humans
Audio-Visual Clustering for 3D Speaker Localization
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection and localization of 3d audio-visual objects using unsupervised clustering
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Maximum a posteriori multimodal 3D object localization with a depth sensor and stereo microphones
Proceedings of the 2nd International Conference on Immersive Telecommunications
A speaker diarization method based on the probabilistic fusion of audio-visual location information
Proceedings of the 2009 international conference on Multimodal interfaces
An embedded audio-visual tracking and speech purification system on a dual-core processor platform
Microprocessors & Microsystems
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Multimodal biometric human recognition for perceptual human-computer interaction
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Conjugate mixture models for clustering multimodal data
Neural Computation
Efficient video coding based on audio-visual focus of attention
Journal of Visual Communication and Image Representation
Finding audio-visual events in informal social gatherings
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Voice activity detection and speaker localization using audiovisual cues
Pattern Recognition Letters
Collaborative personal speaker identification: A generalized approach
Pervasive and Mobile Computing
Hi-index | 0.00 |
Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a novel probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audiovisual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results-based on an objective evaluation procedure-that show that our framework 1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy, 2) can deal with cases of visual clutter and occlusion, and 3) significantly outperforms a traditional sampling-based approach