A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Bayesian learning for hidden Markov model with Gaussian mixture state observation densities
Speech Communication - Eurospeech '91
Head orientation and gaze direction in meetings
CHI '02 Extended Abstracts on Human Factors in Computing Systems
Computing 3-D head orientation from a monocular image sequence
FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Wide-Range, Person- and Illumination-Insensitive Head Orientation Estimation
FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Comparative Study of Coarse Head Pose Estimation
MOTION '02 Proceedings of the Workshop on Motion and Video Computing
Eye gaze tracking techniques for interactive applications
Computer Vision and Image Understanding - Special issue on eye detection and tracking
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Using social geometry to manage interruptions and co-worker attention in office environments
GI '05 Proceedings of Graphics Interface 2005
Detection and application of influence rankings in small group meetings
Proceedings of the 8th international conference on Multimodal interfaces
Real-Time feedback on nonverbal behaviour to enhance social dynamics in small group meetings
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Modeling focus of attention for meeting indexing based on multiple cues
IEEE Transactions on Neural Networks
Guest editorial: special issue on human computing
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Visual activity context for focus of attention estimation in dynamic meetings
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
BOO: Behavior-oriented ontology to describe participant dynamics in collocated design meetings
Expert Systems with Applications: An International Journal
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Putting the pieces together: multimodal analysis of social attention in meetings
Proceedings of the international conference on Multimedia
Visual-context boosting for eye detection
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Head-pose recognition for a game system based on nose's relative position
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV
Model free head pose estimation using stereovision
Pattern Recognition
Pattern Recognition
Engagement-based multi-party dialog with a humanoid robot
SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Multimodal cue detection engine for orchestrated entertainment
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Investigating the midline effect for visual focus of attention recognition
Proceedings of the 14th ACM international conference on Multimodal interaction
Recognizing the visual focus of attention for human robot interaction
HBU'12 Proceedings of the Third international conference on Human Behavior Understanding
Generalised pose estimation using depth
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Proceedings of the 15th ACM on International conference on multimodal interaction
Real-time audio-visual analysis for multiperson videoconferencing
Advances in Multimedia
Detecting People Looking at Each Other in Videos
International Journal of Computer Vision
Hi-index | 0.00 |
We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian mixture model (GMM) or a hidden Markov model (HMM) whose hidden states correspond to the VFOA. The novelties of this paper are threefold. First, contrary to previous studies on the topic, in our setup, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed, which accounts for the specific gazing behavior of each participant. Using a publicly available corpus of eight meetings featuring four persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision-based tracking system. The results clearly show that in such complex but realistic situations, the VFOA recognition performance is highly dependent on how well the visual targets are separated for a given meeting participant. In addition, the results show that the use of a geometric model with unsupervised adaptation achieves better results than the use of training data to set the HMM parameters.