A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Real-Time Face Detection
International Journal of Computer Vision
A joint particle filter for audio-visual speaker tracking
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Audio-Visual Speaker Localization Using Graphical Models
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Joint audio-visual tracking using particle filters
EURASIP Journal on Applied Signal Processing
Noise adaptive stream weighting in audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Patterns of binocular disparity for a fixating observer
BVAI'07 Proceedings of the 2nd international conference on Advances in brain, vision and artificial intelligence
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings
IEEE Transactions on Audio, Speech, and Language Processing
Speaker association with signal-level audiovisual fusion
IEEE Transactions on Multimedia
Hi-index | 0.00 |
We address the issue of localizing individuals in a scene that contains several people engaged in a multiple-speaker conversation. We use a human-like configuration of sensors (binaural and binocular) to gather both auditory and visual observations. We show that the localization problem can be recast as the task of clustering the audio-visual observations into coherent groups. We propose a probabilistic generative model that captures the relations between audio and visual observations. This model maps the data to a representation of the common 3D scene-space, via a pair of Gaussian mixture models. Inference is performed by a version of the Expectation Maximization algorithm, which provides cooperative estimates of both the activity (speaking or not) and the 3D position of each speaker.