Identifying the addressee in human-human-robot interactions based on head pose and speech

Authors:
Michael Katzenmaier;Rainer Stiefelhagen;Tanja Schultz
Affiliations:
Universität Karlsruhe (TH), Karlsruhe, Germany;Universität Karlsruhe (TH), Karlsruhe, Germany;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 3
Cited 30

Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Head orientation and gaze direction in meetings

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Tracking Focus of Attention in Meetings

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Human-centered collaborative interaction

Proceedings of the 1st ACM international workshop on Human-centered multimedia
Human perception of intended addressee during computer-assisted meetings

Proceedings of the 8th international conference on Multimodal interfaces
Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade

Proceedings of the 8th international conference on Multimodal interfaces
Toward open-microphone engagement for multiparty interactions

Proceedings of the 8th international conference on Multimodal interfaces
Tracking head pose and focus of attention with multiple far-field cameras

Proceedings of the 8th international conference on Multimodal interfaces
Multimodalcues for addressee-hood in triadic communication with a human information retrieval agent

Proceedings of the 9th international conference on Multimodal interfaces
Integrating vision and audition within a cognitive architecture to track conversations

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Designing Socially Aware Conversational Agents

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Disambiguating between generic and referential "you" in dialog

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Who is "you"?: combining linguistic and gaze features to resolve second-person references in dialogue

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Are you being addressed?: real-time addressee detection to support remote participants in hybrid meetings

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Cascaded lexicalised classifiers for second-person reference resolution

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Attention and interaction control in a human-human-computer dialogue setting

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Human-centered visualization environments

Human-centered visualization environments
Identifying utterances addressed to an agent in multiparty human-agent conversations

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Making virtual conversational agent aware of the addressee of users' utterances in multi-user conversation using nonverbal information

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Adaptive eye gaze patterns in interactions with human and artificial agents

ACM Transactions on Interactive Intelligent Systems (TiiS)
Are you looking at me, are you talking with me: multimodal classification of the focus of attention

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Learning speaker, addressee and overlap detection models from multimodal streams

Proceedings of the 14th ACM international conference on Multimodal interaction
Using group history to identify character-directed utterances in multi-child interactions

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Addressee identification for human-human-agent multiparty conversations in different proxemics

Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction
Attention-based addressee selection for service and social robots to interact with multiple persons

Proceedings of the Workshop at SIGGRAPH Asia
Given that, should i respond?: contextual addressee estimation in multi-party human-robot interactions

Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction
Designing engagement-aware agents for multiparty conversations

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems

Proceedings of the 15th ACM on International conference on multimodal interaction
Context aware addressee estimation for human robot interaction

Proceedings of the 6th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we investigate the power of acoustic and visual cues, and their combination, to identify the addressee in a human-human-robot interaction. Based on eighteen audio-visual recordings of two human beings and a (simulated) robot we discriminate the interaction of the two humans from the interaction of one human with the robot. The paper compares the result of three approaches. The first approach uses purely acoustic cues to find the addressees. Low level, feature based cues as well as higher-level cues are examined. In the second approach we test whether the human's head pose is a suitable cue. Our results show that visually estimated head pose is a more reliable cue for the identification of the addressee in the human-human-robot interaction. In the third approach we combine the acoustic and visual cues which results in significant improvements.