Put that where? voice and gesture at the graphics interface

  • Authors:
  • Mark Billinghurst

  • Affiliations:
  • Human Interface Technology Laboratory, University of Washington, Box 352-142, Seactle, WA

  • Venue:
  • ACM SIGGRAPH Computer Graphics
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

A person stands in front of a large projection screen on which is shown a checked floor. They say, "Make a table," and a wooden table appears in the middle of the floor."On the table, place a vase," they gesture using a fist relative to palm of their other hand to show the relative location of the vase on the table. A vase appears at the correct location."Next to the table place a chair." A chair appears to the right of the table."Rotate it like this," while rotating their hand causes the chair to turn towards the table."View the scene from this direction," they say while pointing one hand towards the palm of the other. The scene rotates to match their hand orientation.In a matter of moments, a simple scene has been created using natural speech and gesture. The interface of the future? Not at all; Koons, Thorisson and Bolt demonstrated this work in 1992 [23]. Although research such as this has shown the value of combining speech and gesture at the interface, most computer graphics are still being developed with tools no more intuitive than a mouse and keyboard. This need not be the case. Current speech and gesture technologies make multimodal interfaces with combined voice and gesture input easily achievable. There are several commercial versions of continuous dictation software currently available, while tablets and pens are widely supported in graphics applications. However, having this capability doesn't mean that voice and gesture should be added to every modeling package in a haphazard manner. There are numerous issues that must be addressed in order to develop an intuitive interface that uses the strengths of both input modalities.In this article we describe motivations for adding voice and gesture to graphical applications, review previous work showing different ways these modalities may be used and outline some general interface guidelines. Finally, we give an overview of promising areas for future research. Our motivation for writing this is to spur developers to build compelling interfaces that will make speech and gesture as common on the desktop as the keyboard and mouse.