A Probabilistic Framework for Joint Head Tracking and Pose Estimation
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
MCMC-Based Particle Filtering for Tracking a Variable Number of Interacting Targets
IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-visual multi-person tracking and identification for smart environments
Proceedings of the 15th international conference on Multimedia
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Dialog in the open world: platform and applications
Proceedings of the 2009 international conference on Multimodal interfaces
Recognizing visual focus of attention from head pose in natural meetings
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Just-in-time multimodal association and fusion from home entertainment
ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Reasoning for video-mediated group communication
ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Browsing interaction events in recordings of small group activities via multimedia operators
Proceedings of the 18th Brazilian symposium on Multimedia and the web
Orchestration: tv-like mixing grammars applied to video-communication for social groups
Proceedings of the 21st ACM international conference on Multimedia
Hi-index | 0.00 |
In this paper, we describe a low delay real-time multimodal cue detection engine for a living room environment. The system is designed to be used in open, unconstrained environments to allow multiple people to enter, interact and leave the observable world with no constraints. It comprises detection and tracking of up to 4 faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, their association and fusion. The system is designed as a flexible component to be used in conjunction with an orchestrated video conferencing system to improve the overall experience of interaction between spatially separated families and friends. Reduced latency levels achieved to date have shown improved responsiveness of the system.