Multimodal multispeaker probabilistic tracking in meetings

Authors:
Daniel Gatica-Perez;Guillaume Lathoud;Jean-Marc Odobez;Iain McCowan
Affiliations:
IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland;eHealth Research Centre, Brisbane, Australia
Venue:
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Year:
2005

Citing 6
Cited 5

Markov random field modeling in computer vision

Markov random field modeling in computer vision
Distributed meetings: a meeting capture and broadcasting system

Proceedings of the tenth ACM international conference on Multimedia
Color-Based Probabilistic Tracking

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Audio-Video Sensor Fusion with Probabilistic Graphical Models

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Multimodal Speaker Detection Using Input/Output Dynamic Bayesian Networks

ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
Monte Carlo Strategies in Scientific Computing

Monte Carlo Strategies in Scientific Computing

Extracting information from multimedia meeting collections

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech

Proceedings of the 8th international conference on Multimodal interfaces
Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade

Proceedings of the 8th international conference on Multimodal interfaces
Client and speech detection system for intelligent infokiosk

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Learning speaker, addressee and overlap detection models from multimodal streams

Proceedings of the 14th ACM international conference on Multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results -based on an objective evaluation procedure-that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach.