Pronunciation clustering and modeling of variability for appearance-based sign language recognition

Authors:
Morteza Zahedi;Daniel Keysers;Hermann Ney
Affiliations:
Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen University, Aachen, Germany;Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen University, Aachen, Germany;Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen University, Aachen, Germany
Venue:
GW'05 Proceedings of the 6th international conference on Gesture in Human-Computer Interaction and Simulation
Year:
2005

Citing 4
Cited 2

Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Pattern Recognition Using a New Transformation Distance

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Adaptation in Statistical Pattern Recognition Using Tangent Vectors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Appearance-Based recognition of words in american sign language

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part I

Image and video for hearing impaired people

Journal on Image and Video Processing
Combination of tangent distance and an image distortion model for appearance-based sign language recognition

PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a system for automatic sign language recognition of segmented words in American Sign Language (ASL). The system uses appearance-based features extracted directly from the frames captured by standard cameras without any special data acquisition tools. This means that we do not rely on complex preprocessing of the video signal or on an intermediate segmentation step that may produce errors. We introduce a database for ASL word recognition extracted from a publicly available set of video streams. One important property of this database is the large variability of the utterances for each word. To cope with this variability, we propose to model distinct pronunciations of each word using different clustering approaches. Automatic clustering of pronunciations improves the error rate of the system from 28.4% to 23.2%. To model global image transformations, the tangent distance is used within the Gaussian emission densities of the hidden Markov model classifier instead of the Euclidean distance. This approach can further reduce the error rate to 21.5%.