The representation and use of focus in a system for understanding dialogs
Readings in natural language processing
Generalized augmented transition network grammars for generation from semantic networks
Computational Linguistics
Combining deictic gestures and natural language for referent identification
COLING '86 Proceedings of the 11th coference on Computational linguistics
Two-handed gesture in multi-modal natural dialog
UIST '92 Proceedings of the 5th annual ACM symposium on User interface software and technology
Negotiation for automated generation of temporal multimedia presentations
MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Intelligent multimedia interfaces
CHI '94 Conference Companion on Human Factors in Computing Systems
Multimodal human discourse: gesture and speech
ACM Transactions on Computer-Human Interaction (TOCHI)
Intelligent Multimedia Communication
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Human to robot demonstrations of routine home tasks: exploring the role of the robot's feedback
Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
Fusion engines for multimodal input: a survey
Proceedings of the 2009 international conference on Multimodal interfaces
Proceedings of the 2009 international conference on Multimodal interfaces
Interaction with geographic information systems via spatial queries
Journal of Visual Languages and Computing
Planning multimedia explanations using communicative acts
AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 1
Review Article: Multimodal interaction: A review
Pattern Recognition Letters
Hi-index | 0.00 |
People frequently and effectively integrate deictic and graphic gestures with their natural language (NL) when conducting human-to-human dialogue. Similar multi-modal communication can facilitate human interaction with modern sophisticated information processing and decision-aiding computer systems. As part of the CUBRICON project, we are developing NL processing technology that incorporates deictic and graphic gestures with simultaneous coordinated NL for both user inputs and system-generated outputs. Such multi-modal language should be natural and efficient for human-computer dialogue, particularly for presenting or requesting information about objects that are visible, or can be presented visibly, on a graphics display. This paper discusses unique interface capabilities that the CUBRICON system provides including the ability to: (1) accept and understand multi-media input such that references to entities in (spoken or typed) natural language sentences can include coordinated simultaneous pointing to the respective entities on a graphics display; use simultaneous pointing and NL references to disambiguate one another when appropriate; infer the intended referent of a point gesture which is inconsistent with the accompanying NL; (2) dynamically compose and generate multi-modal language that combines NL with deictic gestures and graphic expressions; synchronously present the spoken natural language and coordinated pointing gestures and graphic expressions; discriminate between spoken and written NL.