Optimization in multimodal interpretation

  • Authors:
  • Joyce Y. Chai;Pengyu Hong;Michelle X. Zhou;Zahar Prasov

  • Affiliations:
  • Michigan State University, East Lansing, MI;Harvard University, Cambridge, MA;Intelligent Multimedia Interaction, Hawthorne, NY;Michigan State University, East Lansing, MI

  • Venue:
  • ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a multimodal conversation, the way users communicate with a system depends on the available interaction channels and the situated context (e.g., conversation focus, visual feedback). These dependencies form a rich set of constraints from various perspectives such as temporal alignments between different modalities, coherence of conversation, and the domain semantics. There is strong evidence that competition and ranking of these constraints is important to achieve an optimal interpretation. Thus, we have developed an optimization approach for multimodal interpretation, particularly for interpreting multimodal references. A preliminary evaluation indicates the effectiveness of this approach, especially for complex user inputs that involve multiple referring expressions in a speech utterance and multiple gestures.