Where is "it"? Event Synchronization in Gaze-Speech Input Systems

  • Authors:
  • Manpreet Kaur;Marilyn Tremaine;Ning Huang;Joseph Wilder;Zoran Gacovski;Frans Flippo;Chandra Sekhar Mantravadi

  • Affiliations:
  • Rutgers University, Piscataway, NJ;New Jersey Institute of Technology, Newark, NJ;Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ

  • Venue:
  • Proceedings of the 5th international conference on Multimodal interfaces
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The relationship between gaze and speech is explored for the simple task of moving an object from one location to another on a computer screen. The subject moves a designated object from a group of objects to a new location on the screen by stating, "Move it there". Gaze and speech data are captured to determine if we can robustly predict the selected object and destination position. We have found that the source fixation closest to the desired object begins, with high probability, before the beginning of the word "Move". An analysis of all fixations before and after speech onset time shows that the fixation that best identifies the object to be moved occurs, on average, 630 milliseconds before speech onset with a range of 150 to 1200 milliseconds for individual subjects. The variance in these times for individuals is relatively small although the variance across subjects is large. Selecting a fixation closest to the onset of the word "Move" as the designator of the object to be moved gives a system accuracy close to 95% for all subjects. Thus, although significant differences exist between subjects, we believe that the speech and gaze integration patterns can be modeled reliably for individual users and therefore be used to improve the performance of multimodal systems.