Camera-Based Gesture Recognition for Robot Control

  • Authors:
  • Affiliations:
  • Venue:
  • IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 4 - Volume 4
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several systems for automatic gesture recognition have been developed using different strategies and approaches. In these systems, the recognition engine is mainly based on three algorithms: dynamic pattern matching, statistical classification, and neural networks (NN). In that, paper we present four architectures for gesture-based interaction between a human being and an autonomous mobile robot using the above mentioned techniques or a hybrid combination of them. Each of our gesture recognition architecture consists of a preprocessor and a decoder. The preprocessor, which is common to every system, receives an image as input and produces a continuous feature vector. The task of the decoder is to decode a sequence of these vectors into an estimate of the underlying movement. In the first three systems to determine that estimate, we formally consider the recognition problem as a statistical classification task. Three different hybrid stochastic/connectionist architectures are considered. In the first approach, NNs are used for the classification of single feature vectors while Hidden Markov Models (HMM) for the modeling of sequences of them. In the second, a Radial Basis Function (RBF) network is directly used to compute the HMM state observation probabilities. In the third system that probabilities is calculated by means of recurrent neural networks (RNN) in order to take into account the context information from the previously presented feature vectors. In the last system, we face the recognition task as a template-matching problem by making use of dynamic programming techniques. Here the strategy is to find the minimal distance between a continuous input feature sequence and the classes. Preliminary experiments with our baseline systems achieved recognition accuracy up to 92%. All systems use input from a monocular color video camera, are user-independent but so far, they are not yet real-time.