Visual perception, language and gesture: a model for their understanding in multimodal dialogue systems

Authors:
Frédéric Landragin
Affiliations:
Thales Research and Technology, Route Départementale, Palaiseau Cedex, France
Venue:
Signal Processing - Special section: Multimodal human-computer interfaces
Year:
2006

Citing 7
Cited 7

Relevance: communication and cognition

Relevance: communication and cognition
Attention, intentions, and the structure of discourse

Computational Linguistics
Ten myths of multimodal interaction

Communications of the ACM
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Multimodal Cooperative Resolution of Referential Expressions in the DenK System

CMC '98 Revised Papers from the Second International Conference on Cooperative Multimodal Communication
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Referring to Objects with Spoken and Haptic Modalities

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces

Salience-driven Contextual Priming of Speech Recognition for Human-Robot Interaction

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
A semantic and language-based representation of an environmental scene

Geoinformatica
Annotation schemes for verbal and non-verbal communication: some general issues

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Generating referring expressions with reference domain theory

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
A salience-driven approach to speech recognition for human-robot interaction

ESSLLI'08/09 Proceedings of the 2008 international conference on Interfaces: explorations in logic, language and computation
Speech and 2d deictic gesture reference to virtual scenes

PIT'06 Proceedings of the 2006 international tutorial and research conference on Perception and Interactive Technologies
The blue one to the left: enabling expressive user interaction in a multimodal interface for object selection in virtual 3d environments

Proceedings of the 14th ACM international conference on Multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

The way we see the objects around us determines speech and gestures we use to refer to them. The gestures we produce structure our visual perception. The words we use have an influence on the way we see. In this manner, visual perception, language and gesture present multiple interactions between each other. The problem is global and has to be tackled as a whole in order to understand the complexity of reference phenomena and to deduce a formal model. This model may be useful for any kind of human-machine dialogue system that focuses on deep comprehension. We show how a referring act takes place in a contextual subset of objects. This subset is called 'reference domain' and is implicit. It can be deduced from a lot of clues. Among these clues are those which come from the visual context and those which come from the multimodal utterance. We present the 'multimodal reference domain' model that takes these clues into account and that can be exploited in a multimodal dialogue system when interpreting.