Human perception of intended addressee during computer-assisted meetings

Authors:
Rebecca Lunsford;Sharon Oviatt
Affiliations:
Oregon Health & Science University, Beaverton, OR and Natural Interaction Systems, LLC, Seattle, WA;Oregon Health & Science University, Beaverton, OR and Natural Interaction Systems, LLC, Seattle, WA and University of Washington, Seattle, WA
Venue:
Proceedings of the 8th international conference on Multimodal interfaces
Year:
2006

Citing 5
Cited 8

Identifying the addressee in human-human-robot interactions based on head pose and speech

Proceedings of the 6th international conference on Multimodal interfaces
Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Identifying the intended addressee in mixed human-human and human-computer interaction from non-verbal features

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade

Proceedings of the 8th international conference on Multimodal interfaces
Toward open-microphone engagement for multiparty interactions

Proceedings of the 8th international conference on Multimodal interfaces

Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade

Proceedings of the 8th international conference on Multimodal interfaces
Multimodalcues for addressee-hood in triadic communication with a human information retrieval agent

Proceedings of the 9th international conference on Multimodal interfaces
A high-performance dual-wizard infrastructure for designing speech, pen, and multimodal interfaces

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Identifying utterances addressed to an agent in multiparty human-agent conversations

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Making virtual conversational agent aware of the addressee of users' utterances in multi-user conversation using nonverbal information

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Using group history to identify character-directed utterances in multi-child interactions

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Addressee identification for human-human-agent multiparty conversations in different proxemics

Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction
Implementation and evaluation of a multimodal addressee identification mechanism for multiparty conversation systems

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research aims to develop new open-microphone engagement techniques capable of identifying when a speaker is addressing a computer versus human partner, including during computer-assisted group interactions. The present research explores: (1) how accurately people can judge whether an intended interlocutor is a human versus computer, (2) which linguistic, acoustic-prosodic, and visual information sources they use to make these judgments, and (3) what type of systematic errors are present in their judgments. Sixteen participants were asked to determine a speaker's intended addressee based on actual videotaped utterances matched on illocutionary force, which were played back as: (1) lexical transcriptions only, (2) audio-only, (3) visual-only, and (4) audio-visual information. Perhaps surprisingly, people's accuracy in judging human versus computer addressees did not exceed chance levels with lexical-only content (46%). As predicted, accuracy improved significantly with audio (58%), visual (57%), and especially audio-visual information (63%). Overall, accuracy in detecting human interlocutors was significantly worse than judging computer ones, and specifically worse when only visual information was present because speakers often looked at the computer when addressing peers. In contrast, accuracy in judging computer interlocutors was significantly better whenever visual information was present than with audio alone, and it yielded the highest accuracy levels observed (86%). Questionnaire data also revealed that speakers' gaze, peers' gaze, and tone of voice were considered the most valuable information sources. These results reveal that people rely on cues appropriate for interpersonal interactions in determining computer- versus human-directed speech during mixed human-computer interactions, even though this degrades their accuracy. Future systems that process actual rather than expected communication patterns potentially could be designed that perform better than humans.