Linguistic theories in efficient multimodal reference resolution: an empirical investigation

  • Authors:
  • Joyce Y. Chai;Zahar Prasov;Joseph Blaim;Rong Jin

  • Affiliations:
  • Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI

  • Venue:
  • Proceedings of the 10th international conference on Intelligent user interfaces
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech, gesture, and gaze. To build effective multimodal interfaces, understanding user multimodal inputs is important. Previous linguistic and cognitive studies indicate that user language behavior does not occur randomly, but rather follows certain linguistic and cognitive principles. Therefore, this paper investigates the use of linguistic theories in multimodal interpretation. In particular, we present a greedy algorithm that incorporates Conversation Implicature and Givenness Hierarchy for efficient multimodal reference resolution. Empirical studies indicate that this algorithm significantly reduces the complexity in multimodal reference resolution compared to a previous graph-matching approach. One major advantage of this greedy algorithm is that the prior linguistic and cognitive knowledge can be used to guide the search and significantly prune the search space. Because of its simplicity and generality, this approach has the potential to improve the robustness of interpretation and provide a more practical solution to multimodal input interpretation.