Utilizing visual attention for cross-modal coreference interpretation

  • Authors:
  • Donna Byron;Thomas Mampilly;Vinay Sharma;Tianfang Xu

  • Affiliations:
  • Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio;Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio;Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio;Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio

  • Venue:
  • CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe an exploratory study to develop a model of visual attention that could aid automatic interpretation of exophors in situated dialog. The model is intended to support the reference resolution needs of embodied conversational agents, such as graphical avatars and robotic collaborators. The model tracks the attentional state of one dialog participant as it is represented by his visual input stream, taking into account the recency, exposure time, and visual distinctness of each viewed item. The model correctly predicts the correct referent of 52% of referring expressions produced by speakers in human-human dialog while they were collaborating on a task in a virtual world. This accuracy is comparable with reference resolution based on calculating linguistic salience for the same data.