Context aware addressee estimation for human robot interaction

Authors:
Samira Sheikhi;Dinesh Babu Jayagopi;Vasil Khalidov;Jean-Marc Odobez
Affiliations:
Idiap Research Institute, Martigny, Switzerland;Idiap Research Institute, Martigny, Switzerland;Idiap Research Institute, Martigny, Switzerland;Idiap Research Institute, Martigny, Switzerland
Venue:
Proceedings of the 6th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction
Year:
2013

Citing 13
Cited 0

Identifying the addressee in human-human-robot interactions based on head pose and speech

Proceedings of the 6th international conference on Multimodal interfaces
Contextual recognition of head gestures

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Models for multiparty engagement in open-world dialog

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Learning large margin likelihoods for realtime head pose tracking

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Multiperson Visual Focus of Attention from Head Pose and Meeting Contextual Cues

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making virtual conversational agent aware of the addressee of users' utterances in multi-user conversation using nonverbal information

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Modeling focus of attention for meeting indexing based on multiple cues

IEEE Transactions on Neural Networks
Investigating the midline effect for visual focus of attention recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Given that, should i respond?: contextual addressee estimation in multi-party human-robot interactions

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Leveraging the robot dialog state for visual focus of attention recognition

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper investigates the problem of addressee recognition -to whom a speaker's utterance is intended- in a setting involving a humanoid robot interacting with multiple persons. More specifically, as it is well known that addressee can primarily be derived from the speaker's visual focus of attention (VFOA) defined as whom or what a person is looking at, we address the following questions: how much does the performance degrade when using automatically extracted VFOA from head pose instead of the VFOA ground-truth? Can the conversational context improve addressee recognition by using it either directly as a side cue in the addressee classifier, or indirectly by improving the VFOA recognition, or in both ways? Finally, from a computational perspective, which VFOA features and normalizations are better and does it matter whether the VFOA recognition module only monitors whether a person looks at potential addressee targets (the robot, people) or if it also considers objects of interest in the environment (paintings in our case) as additional VFOA targets? Experiments on the public Vernissage database where the humanoid Nao robots make a quiz to two participants shows that reducing VFOA confusion (either through context, or by ignoring VFOA targets) improves addressee recognition.