Pronunciation clustering and modeling of variability for appearance-based sign language recognition

  • Authors:
  • Morteza Zahedi;Daniel Keysers;Hermann Ney

  • Affiliations:
  • Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen University, Aachen, Germany;Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen University, Aachen, Germany;Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen University, Aachen, Germany

  • Venue:
  • GW'05 Proceedings of the 6th international conference on Gesture in Human-Computer Interaction and Simulation
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a system for automatic sign language recognition of segmented words in American Sign Language (ASL). The system uses appearance-based features extracted directly from the frames captured by standard cameras without any special data acquisition tools. This means that we do not rely on complex preprocessing of the video signal or on an intermediate segmentation step that may produce errors. We introduce a database for ASL word recognition extracted from a publicly available set of video streams. One important property of this database is the large variability of the utterances for each word. To cope with this variability, we propose to model distinct pronunciations of each word using different clustering approaches. Automatic clustering of pronunciations improves the error rate of the system from 28.4% to 23.2%. To model global image transformations, the tangent distance is used within the Gaussian emission densities of the hidden Markov model classifier instead of the Euclidean distance. This approach can further reduce the error rate to 21.5%.