Implementation and comparison of three architectures for gesture recognition

  • Authors:
  • A. Corradini;H.-M. Gross

  • Affiliations:
  • Dept. of Neuroinf., Tech. Hochschule Ilmenau, Germany;-

  • Venue:
  • ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several systems for automatic gesture recognition have been developed using different strategies and approaches. In these systems the recognition engine is mainly based on three algorithms: dynamic pattern matching, statistical classification, and neural networks (NN). In this paper three architectures for the recognition of dynamic gestures using the above mentioned techniques or a hybrid combination of them are presented and compared. For all architectures a common preprocessor receives as input a sequence of color images, and produces as output a sequence of feature vectors of continuous parameters. The first two systems are hybrid architectures consisting of a combination of neural networks and hidden Markov models (HMM). NNs are used for the classification of single feature vectors while HMMs for the modeling of sequences of them with the aim to exploit the properties of both these tools. More precisely, in the first system a Kohonen feature map (SOM) clusters the input space. Further, each code-book is transformed into a symbol from a discrete alphabet and fed into a discrete HMM for classification. In the second approach a radial basis function (RBF) network is directly used to compute the HMM state observation probabilities. In the last system only dynamic programming techniques are employed. An input sequence of feature vectors is matched by some predefined templates by using the dynamic time warping (DTW) algorithm. Preliminary experiments with our baseline systems achieved a recognition accuracy up to 92%. All systems use input from a monocular color video camera, are user-independent but so far, they are not yet real-time.