Audio-visual tracking for natural interactivity
MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Array Signal Processing: Concepts and Techniques
Array Signal Processing: Concepts and Techniques
Face-Responsive Interfaces: From Direct Manipulation to Perceptive Presence
UbiComp '02 Proceedings of the 4th international conference on Ubiquitous Computing
Integrated Person Tracking Using Stereo, Color, and Pattern Detection
CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
A Vision-Based Microphone Switch for Speech Intent Detection
RATFG-RTS '01 Proceedings of the IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS'01)
Adaptive view-based appearance models
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
From conversational tooltips to grounded discourse: head poseTracking in interactive dialog systems
Proceedings of the 6th international conference on Multimodal interfaces
Contextual recognition of head gestures
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Kimono: kiosk-mobile phone knowledge sharing system
MUM '05 Proceedings of the 4th international conference on Mobile and ubiquitous multimedia
Head gestures for perceptual interfaces: The role of context in improving recognition
Artificial Intelligence
Speaker separation and tracking system
EURASIP Journal on Applied Signal Processing
Multimodalcues for addressee-hood in triadic communication with a human information retrieval agent
Proceedings of the 9th international conference on Multimodal interfaces
Proceedings of the Workshop on Use of Context in Vision Processing
A two-stage multimodal speaker location-aware approach in pervasive computing
International Journal of Computer Applications in Technology
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Robust user context analysis for multimodal interfaces
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Hi-index | 0.00 |
This paper presents a multi-modal approach to locate a speaker in a scene and determine to whom he or she is speaking. We present a simple probabilistic framework that combines multiple cues derived from both audio and video information. A purely visual cue is obtained using a head tracker to identify possible speakers in a scene and provide both their 3-D positions and orientation. In addition, estimates of the audio signal's direction of arrival are obtained with the help of a two-element microphone array. A third cue measures the association between the audio and the tracked regions in the video. Integrating these cues provides a more robust solution than using any single cue alone. The usefulness of our approach is shown in our results for video sequences with two or more people in a prototype interactive kiosk environment.