Gesture recognition using the Perseus architecture

  • Authors:
  • R. E. Kahn;M. J. Swain;P. N. Prokopowicz;R. J. Firby

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

Communication involves more than simply spoken information. Typical interactions use gestures to accurately and efficiently convey ideas that are more easily expressed with actions than words. A more intuitive interface with machines should involve not only speech recognition, but gesture recognition as well. One of the most frequently used and expressively powerful gestures is pointing. It is far easier and more accurate to point to an object than give a verbal description of its location. To produce a more efficient, accurate, and natural human-machine interface we use the Perseus architecture to interpret the pointing gesture. Perseus uses a variety of techniques to reliably solve this complex visual problem in non-engineered worlds. Knowledge about the task and environment is used at all stages of processing to best interpret the scene for the current situation. Once the visual operators are chosen, contextual knowledge is used to tune them for maximal performance. Redundant interpretation of the scene provides robustness to errors in interpretation. Fusion of independent types of information results in increased tolerance when assumptions about the environment fail. Windows of attention are used to improve speed and remove distractions from the scene. Furthermore, reuse is a major issue in the design of Perseus. Information about the environment and task is explicitly represented so it can easily be re-used in tasks other than pointing. A clean interface to Perseus is provided for symbolic higher level systems like the RAP reactive execution system. In this paper we describe Perseus in detail and show how it is used to locate objects pointed to by people.