Recent developments in visual sign language recognition

  • Authors:
  • Ulrich von Agris;Jörg Zieren;Ulrich Canzler;Britta Bauer;Karl-Friedrich Kraiss

  • Affiliations:
  • RWTH Aachen University, Institute of Man–Machine Interaction, Ahornstrasse 55, 52074, Aachen, Germany;RWTH Aachen University, Institute of Man–Machine Interaction, Ahornstrasse 55, 52074, Aachen, Germany;RWTH Aachen University, Institute of Man–Machine Interaction, Ahornstrasse 55, 52074, Aachen, Germany;RWTH Aachen University, Institute of Man–Machine Interaction, Ahornstrasse 55, 52074, Aachen, Germany;RWTH Aachen University, Institute of Man–Machine Interaction, Ahornstrasse 55, 52074, Aachen, Germany

  • Venue:
  • Universal Access in the Information Society
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research in the field of sign language recognition has made significant advances in recent years. The present achievements provide the basis for future applications with the objective of supporting the integration of deaf people into the hearing society. Translation systems, for example, could facilitate communication between deaf and hearing people in public situations. Further applications, such as user interfaces and automatic indexing of signed videos, become feasible. The current state in sign language recognition is roughly 30 years behind speech recognition, which corresponds to the gradual transition from isolated to continuous recognition for small vocabulary tasks. Research efforts were mainly focused on robust feature extraction or statistical modeling of signs. However, current recognition systems are still designed for signer-dependent operation under laboratory conditions. This paper describes a comprehensive concept for robust visual sign language recognition, which represents the recent developments in this field. The proposed recognition system aims for signer-independent operation and utilizes a single video camera for data acquisition to ensure user-friendliness. Since sign languages make use of manual and facial means of expression, both channels are employed for recognition. For mobile operation in uncontrolled environments, sophisticated algorithms were developed that robustly extract manual and facial features. The extraction of manual features relies on a multiple hypotheses tracking approach to resolve ambiguities of hand positions. For facial feature extraction, an active appearance model is applied which allows identification of areas of interest such as the eyes and mouth region. In the next processing step, a numerical description of the facial expression, head pose, line of sight, and lip outline is computed. The system employs a resolution strategy for dealing with mutual overlapping of the signer’s hands and face. Classification is based on hidden Markov models which are able to compensate time and amplitude variances in the articulation of a sign. The classification stage is designed for recognition of isolated signs, as well as of continuous sign language. In the latter case, a stochastic language model can be utilized, which considers uni- and bigram probabilities of single and successive signs. For statistical modeling of reference models each sign is represented either as a whole or as a composition of smaller subunits—similar to phonemes in spoken languages. While recognition based on word models is limited to rather small vocabularies, subunit models open the door to large vocabularies. Achieving signer-independence constitutes a challenging problem, as the articulation of a sign is subject to high interpersonal variance. This problem cannot be solved by simple feature normalization and must be addressed at the classification level. Therefore, dedicated adaptation methods known from speech recognition were implemented and modified to consider the specifics of sign languages. For rapid adaptation to unknown signers the proposed recognition system employs a combined approach of maximum likelihood linear regression and maximum a posteriori estimation.