Put that where? voice and gesture at the graphics interface

Authors:
Mark Billinghurst
Affiliations:
Human Interface Technology Laboratory, University of Washington, Box 352-142, Seactle, WA
Venue:
ACM SIGGRAPH Computer Graphics
Year:
1998

Citing 16
Cited 7

Conversing and computers

Human-computer interaction
A synthetic visual environment with hand gesturing and voice input

CHI '89 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The utility of speech input in user-computer interfaces

International Journal of Man-Machine Studies
Talk and Draw: Bundling Speech and Graphics

Computer
Intelligent multi-media interface technology

Intelligent user interfaces
The role of natural language in a multimodal interface

UIST '92 Proceedings of the 5th annual ACM symposium on User interface software and technology
Multi-modal natural dialogue

CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A design method for “whole-hand” human-computer interaction

ACM Transactions on Information Systems (TOIS)
Charade: remote control of objects using free-hand gestures

Communications of the ACM - Special issue on computer augmented environments: back to the real world
Gestures with speech for graphic manipulation

International Journal of Man-Machine Studies
Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
A generic platform for addressing the multimodal challenge

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Multimodal interfaces for dynamic interactive maps

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
User-Centered Modeling for Spoken Language and Multimodal Interfaces

IEEE MultiMedia
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Put: Language-Based Interactive Manipulation of Objects

IEEE Computer Graphics and Applications

Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Voice recognition technology for visual artists with disabilities in their upper limbs

OZCHI '05 Proceedings of the 17th Australia conference on Computer-Human Interaction: Citizens Online: Considerations for Today and the Future
Multimodal astronaut virtual training prototype

International Journal of Human-Computer Studies - Interaction with virtual environments
Human centred design of 3-D interaction devices to control virtual environments

International Journal of Human-Computer Studies - Interaction with virtual environments
Vision-based hand pose estimation: A review

Computer Vision and Image Understanding
Adding speech recognition support to UML tools

Journal of Visual Languages and Computing
Pointing and speech: comparison of various voice commands

Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

A person stands in front of a large projection screen on which is shown a checked floor. They say, "Make a table," and a wooden table appears in the middle of the floor."On the table, place a vase," they gesture using a fist relative to palm of their other hand to show the relative location of the vase on the table. A vase appears at the correct location."Next to the table place a chair." A chair appears to the right of the table."Rotate it like this," while rotating their hand causes the chair to turn towards the table."View the scene from this direction," they say while pointing one hand towards the palm of the other. The scene rotates to match their hand orientation.In a matter of moments, a simple scene has been created using natural speech and gesture. The interface of the future? Not at all; Koons, Thorisson and Bolt demonstrated this work in 1992 [23]. Although research such as this has shown the value of combining speech and gesture at the interface, most computer graphics are still being developed with tools no more intuitive than a mouse and keyboard. This need not be the case. Current speech and gesture technologies make multimodal interfaces with combined voice and gesture input easily achievable. There are several commercial versions of continuous dictation software currently available, while tablets and pens are widely supported in graphics applications. However, having this capability doesn't mean that voice and gesture should be added to every modeling package in a haphazard manner. There are numerous issues that must be addressed in order to develop an intuitive interface that uses the strengths of both input modalities.In this article we describe motivations for adding voice and gesture to graphical applications, review previous work showing different ways these modalities may be used and outline some general interface guidelines. Finally, we give an overview of promising areas for future research. Our motivation for writing this is to spur developers to build compelling interfaces that will make speech and gesture as common on the desktop as the keyboard and mouse.