Toward Multimodal Interpretation in a Natural Speech/Gesture Interface

Authors:
Affiliations:
Venue:
ICIIS '99 Proceedings of the 1999 International Conference on Information Intelligence and Systems
Year:
1999

Citing 0
Cited 1

Toward multimodal fusion of affective cues

Proceedings of the 1st ACM international workshop on Human-centered multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hand gestures and speech comprise the most important modalities of human to human interaction. Motivated by this, there has been a considerable interest in incorporating these modalities for "natural" human-computer interaction (HCI) particularly within virtual environments. An important feature of such a natural interface would be an absence of predefined speech and gesture commands. The resulting bimodal speech/gesture HCI "language" would thus have to be interpreted by the computer. This involves challenge ranging from the low-level signal processing of bimodal (audio/video) input to the high level interpretation of natural speech/gesture in HCI. This paper identifies the issues of natural (non-prefixed) multimodal HCI interpretation. Since, in the natural interaction, gestures do not exhibit one-to-one mapping of their form to meaning, we specifically address problems associated with vision-based gesture interpretation in a multimodal interface. We consider the design of a speech/gesture interface in the context of a set of spatial tasks defined on a computerized campus map. The task context makes it possible to study the critical components of the multimodal interpretation and integration problem.