Recognizing visual focus of attention from head pose in natural meetings

Authors:
Sileye O. Ba;Jean-Marc Odobez
Affiliations:
Institut Dalle Molle d'Intelligence Artificielle Perceptive, Research Institute, Martigny, Switzerland and Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland;Institut Dalle Molle d'Intelligence Artificielle Perceptive, Research Institute, Martigny, Switzerland and Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Year:
2009

Citing 13
Cited 19

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Bayesian learning for hidden Markov model with Gaussian mixture state observation densities

Speech Communication - Eurospeech '91
Head orientation and gaze direction in meetings

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Computing 3-D head orientation from a monocular image sequence

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Wide-Range, Person- and Illumination-Insensitive Head Orientation Estimation

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Comparative Study of Coarse Head Pose Estimation

MOTION '02 Proceedings of the Workshop on Motion and Video Computing
Eye gaze tracking techniques for interactive applications

Computer Vision and Image Understanding - Special issue on eye detection and tracking
A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Using social geometry to manage interruptions and co-worker attention in office environments

GI '05 Proceedings of Graphics Interface 2005
Detection and application of influence rankings in small group meetings

Proceedings of the 8th international conference on Multimodal interfaces
Real-Time feedback on nonverbal behaviour to enhance social dynamics in small group meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Study on eye gaze estimation

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Modeling focus of attention for meeting indexing based on multiple cues

IEEE Transactions on Neural Networks

Guest editorial: special issue on human computing

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Visual activity context for focus of attention estimation in dynamic meetings

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Are you being addressed?: real-time addressee detection to support remote participants in hybrid meetings

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
BOO: Behavior-oriented ontology to describe participant dynamics in collocated design meetings

Expert Systems with Applications: An International Journal
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Putting the pieces together: multimodal analysis of social attention in meetings

Proceedings of the international conference on Multimedia
Visual-context boosting for eye detection

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Head-pose recognition for a game system based on nose's relative position

HCII'11 Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV
Model free head pose estimation using stereovision

Pattern Recognition
Fuzzy Gaussian Mixture Models

Pattern Recognition
Engagement-based multi-party dialog with a humanoid robot

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Multimodal cue detection engine for orchestrated entertainment

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Investigating the midline effect for visual focus of attention recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Recognizing the visual focus of attention for human robot interaction

HBU'12 Proceedings of the Third international conference on Human Behavior Understanding
Generalised pose estimation using depth

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing
On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions

Proceedings of the 15th ACM on International conference on multimodal interaction
Real-time audio-visual analysis for multiperson videoconferencing

Advances in Multimedia
Detecting People Looking at Each Other in Videos

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian mixture model (GMM) or a hidden Markov model (HMM) whose hidden states correspond to the VFOA. The novelties of this paper are threefold. First, contrary to previous studies on the topic, in our setup, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed, which accounts for the specific gazing behavior of each participant. Using a publicly available corpus of eight meetings featuring four persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision-based tracking system. The results clearly show that in such complex but realistic situations, the VFOA recognition performance is highly dependent on how well the visual targets are separated for a given meeting participant. In addition, the results show that the use of a geometric model with unsupervised adaptation achieves better results than the use of training data to set the HMM parameters.